From duke at openjdk.org Mon Dec 1 00:38:50 2025 From: duke at openjdk.org (Shawn M Emery) Date: Mon, 1 Dec 2025 00:38:50 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v8] In-Reply-To: References: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> Message-ID: On Mon, 24 Nov 2025 17:24:50 GMT, Sandhya Viswanathan wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed the ENCRYPT_16_BLKS fall through case that sviswa7 pointed out in PR review. > > Marked as reviewed by sviswanathan (Reviewer). @sviswa7 or @shipilev, if the updated changes look good to you then could you please reapprove/approve the PR as I don't have Reviewer privileges at this point. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28363#issuecomment-3594057961 From vlivanov at openjdk.org Mon Dec 1 03:00:58 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 1 Dec 2025 03:00:58 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v23] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 15:47:54 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 > - Incorporating polished comments suggestions from Daniel > - Review comments resolution > - Review comments resolutions > - Review comments resolution > - Extending biasing heuristics to account for bias range with minimum degree of freedom. Review feedback incorporated. > - Generic operand traversal and sharpening candidate selection based on RegisterMask and non-interference. Review feedback incorporated > - Review comments resolution > - Review comments resolutions > - Moving demotion candidate marking to AD file, review comments resolutions > - ... and 11 more: https://git.openjdk.org/jdk/compare/1ce2a44e...93577b83 Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3523000265 From jbhateja at openjdk.org Mon Dec 1 06:07:13 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 06:07:13 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v23] In-Reply-To: References: Message-ID: <8ekvtuOrgyG4hIZBv08MnyISgdxUyLoNY7VOppgnkHA=.bc64a9a6-4781-48ae-81e0-5e04402be73b@github.com> On Mon, 1 Dec 2025 02:57:51 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 >> - Incorporating polished comments suggestions from Daniel >> - Review comments resolution >> - Review comments resolutions >> - Review comments resolution >> - Extending biasing heuristics to account for bias range with minimum degree of freedom. Review feedback incorporated. >> - Generic operand traversal and sharpening candidate selection based on RegisterMask and non-interference. Review feedback incorporated >> - Review comments resolution >> - Review comments resolutions >> - Moving demotion candidate marking to AD file, review comments resolutions >> - ... and 11 more: https://git.openjdk.org/jdk/compare/1ce2a44e...93577b83 > > Marked as reviewed by vlivanov (Reviewer). Thanks @iwanowww , @dean-long, @dlunde , @merykitty and @sviswa7 for your reviews and approval ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3594672542 From jbhateja at openjdk.org Mon Dec 1 06:07:14 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 06:07:14 GMT Subject: Integrated: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions In-Reply-To: References: Message-ID: On Mon, 14 Jul 2025 02:36:24 GMT, Jatin Bhateja wrote: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: e0311ecb Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/e0311ecb85b78b6d97387c17102a8b6759eefc36 Stats: 283 lines in 13 files changed: 205 ins; 15 del; 63 mod 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions Reviewed-by: sviswanathan, dlunden, vlivanov, qamai ------------- PR: https://git.openjdk.org/jdk/pull/26283 From epeter at openjdk.org Mon Dec 1 06:45:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 1 Dec 2025 06:45:06 GMT Subject: RFR: 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 In-Reply-To: <3861jYG-DmA_XdGaE8Zj5GlutdxiXc4t-jjRnixANF4=.7119ba1e-8af8-4aa6-9bb2-52c92f1b3d91@github.com> References: <3861jYG-DmA_XdGaE8Zj5GlutdxiXc4t-jjRnixANF4=.7119ba1e-8af8-4aa6-9bb2-52c92f1b3d91@github.com> Message-ID: On Fri, 28 Nov 2025 10:12:19 GMT, Matthias Baesken wrote: >>> I leave it up to you if you want to file an RFE for the error message. I don't have the expertise on Windows nor on GC. >> >> @xmas92 , @jsikstro what do you think ? >> Is this about the 'ZGC requires Windows version 1803 or later' message that surprised us a little bit because we see it on Windows server 2016 , but the 1803 looks like it refers to some update of good old Win 10 . > >> @MBaesken 1803 seems to refer to both a Windows 10 and Windows Server 2016 (internal) release number/version. Here's a version list of the old semi-annual releases of Windows Server 2016: https://en.wikipedia.org/wiki/Windows_Server#Semi-Annual_releases_(discontinued) > > Thanks ! > The wikipedia says 'semi-annual releases do not include any desktop environments. Instead, they are restricted to the Nano Server configuration installed in a [Docker](https://en.wikipedia.org/wiki/Docker_(software)) [container](https://en.wikipedia.org/wiki/Containerization_(computing)),[[17]](https://en.wikipedia.org/wiki/Windows_Server#cite_note-thomasmaurer-17)[[29]](https://en.wikipedia.org/wiki/Windows_Server#cite_note-:0-29) and the Server Core configuration, licensed only to serve as a container host' so this sounds like it is a rather special 'flavor' of Win Server 2016 . > So maybe it is no wonder what we get the warning and have no VirtualAlloc2 on our Win Server 2016 test machine. @MBaesken @chhagedorn Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28537#issuecomment-3594810088 From epeter at openjdk.org Mon Dec 1 06:45:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 1 Dec 2025 06:45:07 GMT Subject: Integrated: 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 13:53:12 GMT, Emanuel Peter wrote: > @MBaesken Reported this issue on windows: > > TestAliasingCheckPreLimitNotAvailable_all-flags-fixed-stress-seed.jtr and TestAliasingCheckPreLimitNotAvailable_all-flags-no-stress-seed.jtr show failures on WIndows : > > [0.095s][error][gc] Failed to lookup symbol: VirtualAlloc2 > Error occurred during initialization of VM > ZGC requires Windows version 1803 or later > > AIX fails too : > Error occurred during initialization of VM > Option -XX:+UseZGC not supported > > > I learned a small lesson here: `@requires vm.gc.Z` is much smarter than checking that no other GC is set, or ZGC is set. It also checks if ZGC is available, which is not always the case, e.g. on the reported Windows machne. > > @MBaesken Can you please confirm that this fixes the test for you? This pull request has now been integrated. Changeset: 81b26ba8 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/81b26ba8131b74a7bb4309bd3608dda2ba99a6ca Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 Reviewed-by: chagedorn, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/28537 From epeter at openjdk.org Mon Dec 1 06:58:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 1 Dec 2025 06:58:03 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations Message-ID: We should test `Float16` with Template Framework Tests. For this, I'm now implementing: - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. ------------- Commit messages: - add more flags again - add module to compilation - Merge branch 'master' into JDK-8370922-TemplateFramework-Library-Float16 - remove old TODOs - add Float16 to ExpressionFuzzer.java - fix jtreg commands - remove some unnecessary incubator flags - comparisons - rest of Float16 operators - verify for Float16 - ... and 4 more: https://git.openjdk.org/jdk/compare/08c16c38...c87acd90 Changes: https://git.openjdk.org/jdk/pull/28095/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28095&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370922 Stats: 376 lines in 9 files changed: 348 ins; 4 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/28095.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28095/head:pull/28095 PR: https://git.openjdk.org/jdk/pull/28095 From chagedorn at openjdk.org Mon Dec 1 07:09:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 1 Dec 2025 07:09:59 GMT Subject: Integrated: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: References: Message-ID: <2jMJC3vg0OAMgcoErG97jNFQb6Egsi352SstiE7KVKM=.42c4dc94-db04-4a06-b952-dda807762703@github.com> On Tue, 25 Nov 2025 16:51:39 GMT, Christian Hagedorn wrote: > [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: > > - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". > - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. > - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: > https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 > I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. > > #### Testing > - [X] Tier1 > - [X] Tier5 with IR framework internal tests only > - [ ] Failing IR framework internal tests on all platforms > > Thanks, > Christian This pull request has now been integrated. Changeset: 293fec7e Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/293fec7e28ed06f0942e94b1c21affdf6aabe9ca Stats: 8 lines in 2 files changed: 0 ins; 1 del; 7 mod 8372461: [IR Framework] Multiple test failures after JDK-8371789 Reviewed-by: epeter, syan, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/28495 From mhaessig at openjdk.org Mon Dec 1 07:13:53 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 1 Dec 2025 07:13:53 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v11] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 21:50:27 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Simplify test, add temporary @IR rule for testLongRange and improve comments Testing passed up to tier7 ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26143#pullrequestreview-3523457833 From chagedorn at openjdk.org Mon Dec 1 07:13:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 1 Dec 2025 07:13:59 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v2] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: On Wed, 26 Nov 2025 06:31:22 GMT, Dean Long wrote: >> The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. >> >> I played around with other approaches, like: >> 1. setting the stack to empty >> 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() >> 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set >> but in the end I decided to go with the minimal localized low-risk change. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > remove extra spaces Looks good to me, too. > always setting the reexecute flag in add_safepoint_edges() if must_throw is set but in the end I decided to go with the minimal localized low-risk change. Is this something we should follow up with? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28486#pullrequestreview-3523457969 From jbhateja at openjdk.org Mon Dec 1 07:15:41 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 07:15:41 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v4] In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding a testpoint ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28533/files - new: https://git.openjdk.org/jdk/pull/28533/files/2c08c7db..be24b1af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=02-03 Stats: 54 lines in 1 file changed: 54 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From chagedorn at openjdk.org Mon Dec 1 07:18:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 1 Dec 2025 07:18:53 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: References: Message-ID: <5-3zAbx-Kx46DdjWSHSR5h34DzqC79DdXJKAb8haPKk=.4bbe4c2c-097b-4e8d-9d3c-b85d2048416d@github.com> On Fri, 28 Nov 2025 10:50:44 GMT, Roland Westrelin wrote: > Crash occurs because a `MergeMem` node references itself: > > > 608 MergeMem === _ 1 608 1 1 1 1 1 1 1 1 1 1 878 [[ 877 878 608 420 597 ]] { - - - - - - - - - - N878:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !orig=[524] !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > ``` > > Before IGVN, that part of the stream is: > > > 522 Region === 522 604 521 [[ 522 538 523 524 525 526 527 528 529 530 531 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > 524 Phi === 522 608 464 [[ 588 581 564 546 564 559 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > > 538 If === 522 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 553 547 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 540 IfFalse === 538 [[ 548 546 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 553 If === 539 535 [[ 554 555 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 554 IfTrue === 553 [[ 562 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 555 IfFalse === 553 [[ 548 559 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > > 548 Region === 548 _ 540 555 [[ 548 562 561 563 564 565 566 567 568 569 570 571 572 573 574 575 576 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:88 (line 60) > 564 Phi === 548 _ 524 524 [[ 581 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:85 (line 61) > > 562 Region === 562 548 554 [[ 562 600 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 596 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > 581 Phi === 562 564 524 [[ 420 597 610 608 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > > 608 MergeMem === _ 1 581 1 1 1 1 1 1 1 1 1 1 588 [[ 524 ]] { - - - - - - - - - - N588:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > > > 522 is a loop head, 604 is the backedge. The loop becomes unreachable > during IGVN. The loop body above is transformed to: > > > 538 If === 604 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 562 547 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (l... That looks good to me. I'll submit some testing. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28554#pullrequestreview-3523481909 From chagedorn at openjdk.org Mon Dec 1 07:19:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 1 Dec 2025 07:19:58 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v4] In-Reply-To: <5P58y7mFExd-rdT_nGu_Ky0UG-vDGPRG2IycLX6xwIY=.403c2f90-1ab3-4096-80a7-b80d819d3ca9@github.com> References: <5P58y7mFExd-rdT_nGu_Ky0UG-vDGPRG2IycLX6xwIY=.403c2f90-1ab3-4096-80a7-b80d819d3ca9@github.com> Message-ID: On Fri, 28 Nov 2025 09:40:25 GMT, Galder Zamarre?o wrote: >> Trivial cleanup to move tests out of a test class whose description does not match these tests > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/gcbarriers/TestMinMaxLongLoopBarrier.java > > Co-authored-by: Emanuel Peter Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28385#pullrequestreview-3523489723 From shade at openjdk.org Mon Dec 1 07:54:03 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 07:54:03 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v10] In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 06:01:26 GMT, Jiangli Zhou wrote: >> Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. >> >> Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Change to break before operators. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28363#pullrequestreview-3523605119 From shade at openjdk.org Mon Dec 1 07:59:47 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 07:59:47 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: References: Message-ID: <5g0wstbmWC5gv_OG3sTv5Lb0eYCR4Cq3zQb1PJiWA6w=.efbee9c2-a9db-428b-8aa7-1c3d198d05e9@github.com> On Fri, 28 Nov 2025 10:50:44 GMT, Roland Westrelin wrote: > Crash occurs because a `MergeMem` node references itself: > > > 608 MergeMem === _ 1 608 1 1 1 1 1 1 1 1 1 1 878 [[ 877 878 608 420 597 ]] { - - - - - - - - - - N878:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !orig=[524] !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > ``` > > Before IGVN, that part of the stream is: > > > 522 Region === 522 604 521 [[ 522 538 523 524 525 526 527 528 529 530 531 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > 524 Phi === 522 608 464 [[ 588 581 564 546 564 559 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > > 538 If === 522 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 553 547 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 540 IfFalse === 538 [[ 548 546 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 553 If === 539 535 [[ 554 555 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 554 IfTrue === 553 [[ 562 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 555 IfFalse === 553 [[ 548 559 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > > 548 Region === 548 _ 540 555 [[ 548 562 561 563 564 565 566 567 568 569 570 571 572 573 574 575 576 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:88 (line 60) > 564 Phi === 548 _ 524 524 [[ 581 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:85 (line 61) > > 562 Region === 562 548 554 [[ 562 600 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 596 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > 581 Phi === 562 564 524 [[ 420 597 610 608 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > > 608 MergeMem === _ 1 581 1 1 1 1 1 1 1 1 1 1 588 [[ 524 ]] { - - - - - - - - - - N588:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > > > 522 is a loop head, 604 is the backedge. The loop becomes unreachable > during IGVN. The loop body above is transformed to: > > > 538 If === 604 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 562 547 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (l... GHA failures in [com/sun/crypto/provider/Cipher/HPKE/KAT9180](https://github.com/rwestrel/jdk/actions/runs/19761317022#user-content-com_sun_crypto_provider_cipher_hpke_kat9180) would disappear if you merge from master. Actually, this might mean the PR base is quite old, and there might be other bugs on the intersection with this one. Merge from master and pass the GHA, maybe? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28554#issuecomment-3595112399 From dfenacci at openjdk.org Mon Dec 1 08:23:56 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 1 Dec 2025 08:23:56 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 10:50:44 GMT, Roland Westrelin wrote: > Crash occurs because a `MergeMem` node references itself: > > > 608 MergeMem === _ 1 608 1 1 1 1 1 1 1 1 1 1 878 [[ 877 878 608 420 597 ]] { - - - - - - - - - - N878:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !orig=[524] !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > ``` > > Before IGVN, that part of the stream is: > > > 522 Region === 522 604 521 [[ 522 538 523 524 525 526 527 528 529 530 531 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > 524 Phi === 522 608 464 [[ 588 581 564 546 564 559 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > > 538 If === 522 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 553 547 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 540 IfFalse === 538 [[ 548 546 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 553 If === 539 535 [[ 554 555 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 554 IfTrue === 553 [[ 562 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 555 IfFalse === 553 [[ 548 559 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > > 548 Region === 548 _ 540 555 [[ 548 562 561 563 564 565 566 567 568 569 570 571 572 573 574 575 576 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:88 (line 60) > 564 Phi === 548 _ 524 524 [[ 581 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:85 (line 61) > > 562 Region === 562 548 554 [[ 562 600 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 596 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > 581 Phi === 562 564 524 [[ 420 597 610 608 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > > 608 MergeMem === _ 1 581 1 1 1 1 1 1 1 1 1 1 588 [[ 524 ]] { - - - - - - - - - - N588:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > > > 522 is a loop head, 604 is the backedge. The loop becomes unreachable > during IGVN. The loop body above is transformed to: > > > 538 If === 604 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 562 547 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (l... Thanks for fixing this @rwestrel! Barring Christian's testing, the change looks good to me. src/hotspot/share/opto/cfgnode.cpp line 1404: > 1402: Node* other_phi_input = in(j); > 1403: if (other_phi_input != nullptr && other_phi_input == merge_mem->base_memory() && !is_data_loop(region, phi_input, igvn)) { > 1404: // merge_mem is a successor memory to other_phi_input, and is not pinned inside the diamond, so push it out. Do you think it might be worth adding an additional reason for `!is_data_loop` in the comment? ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/28554#pullrequestreview-3523662101 PR Review Comment: https://git.openjdk.org/jdk/pull/28554#discussion_r2576027929 From lucy at openjdk.org Mon Dec 1 08:31:27 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 1 Dec 2025 08:31:27 GMT Subject: RFR: 8372730: Problem list compiler/arguments/TestCodeEntryAlignment.java on x64 In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 10:33:24 GMT, Matthias Baesken wrote: > [JDK-8372720](https://bugs.openjdk.org/browse/JDK-8372720) problem listed the test compiler/arguments/TestCodeEntryAlignment.java on macOS x64 but the issue appears on other OS running on x64 CPUs (e.g. Linux) too . LGTM ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28553#pullrequestreview-3523762955 From shade at openjdk.org Mon Dec 1 08:44:06 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 08:44:06 GMT Subject: Integrated: 8372188: AArch64: Generate atomic match rules from M4 stencils In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 14:54:06 GMT, Aleksey Shipilev wrote: > Current atomic match rules are all over the place in AArch64: > - CAE and weak CAS rules are generated with the help of `cas.m4`, and then are supposed to be copy-pasted (?) into `aarch64.ad`. I did it about 20 times when fixing [JDK-8372154](https://bugs.openjdk.org/browse/JDK-8372154), gets tedious very quickly. > - Strong CAS and get-and-set rules are still in the same section of `aarch64.ad`, and are written by hand. Yet, those can be automatically generated from M4 stencils as well. > > This PR cleans that up by moving all these rules into a separate `.ad` file, which one can cleanly re-generate by invoking `m4 aarch64_atomic_ad.m4 > aarch64_atomic.ad`. The meat of the change is `aarch64_atomic.m4`, everything else is either generated from it, or removed in favor of auto-generated code. There should be no semantic change, as I attempted to move the rules mostly verbatim, only changing non-semantic stuff like match rule names and some formats. > > Testing: > - [x] Eyeballing match rules before/after > - [x] Linux AArch64 server fastdebug, `hotspot_compiler` > - [x] Linux AArch64 server fastdebug, `tier1` > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, jcstress run This pull request has now been integrated. Changeset: 3481252c Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/3481252ced7c06c44154ceccc56b12cfd9a490c3 Stats: 2349 lines in 5 files changed: 1156 ins; 1193 del; 0 mod 8372188: AArch64: Generate atomic match rules from M4 stencils Reviewed-by: aph, haosun ------------- PR: https://git.openjdk.org/jdk/pull/28538 From shade at openjdk.org Mon Dec 1 08:44:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 08:44:04 GMT Subject: RFR: 8372188: AArch64: Generate atomic match rules from M4 stencils In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 14:54:06 GMT, Aleksey Shipilev wrote: > Current atomic match rules are all over the place in AArch64: > - CAE and weak CAS rules are generated with the help of `cas.m4`, and then are supposed to be copy-pasted (?) into `aarch64.ad`. I did it about 20 times when fixing [JDK-8372154](https://bugs.openjdk.org/browse/JDK-8372154), gets tedious very quickly. > - Strong CAS and get-and-set rules are still in the same section of `aarch64.ad`, and are written by hand. Yet, those can be automatically generated from M4 stencils as well. > > This PR cleans that up by moving all these rules into a separate `.ad` file, which one can cleanly re-generate by invoking `m4 aarch64_atomic_ad.m4 > aarch64_atomic.ad`. The meat of the change is `aarch64_atomic.m4`, everything else is either generated from it, or removed in favor of auto-generated code. There should be no semantic change, as I attempted to move the rules mostly verbatim, only changing non-semantic stuff like match rule names and some formats. > > Testing: > - [x] Eyeballing match rules before/after > - [x] Linux AArch64 server fastdebug, `hotspot_compiler` > - [x] Linux AArch64 server fastdebug, `tier1` > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, jcstress run Over the weekend (60+ hours) jcstress run comes clean, apart from errant `MaxVectorSize` asserts unrelated to this patch. So I am integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28538#issuecomment-3595289184 From jbhateja at openjdk.org Mon Dec 1 08:46:48 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 08:46:48 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v3] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> <5bV8t0Bo16-WVON8_AJLfcPDDqWVHDxIjmdGPPNazE8=.51d5a17d-1b87-44d4-ad41-e9d346e6b9f7@github.com> Message-ID: On Thu, 27 Nov 2025 16:12:59 GMT, Emanuel Peter wrote: > Ok, that's fine with me too. > > It would be nice if you could also attach a regression test, or maybe add an additional run to the existing test, with the required flags for reproducing this issue. @eme64 addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28533#issuecomment-3595303558 From goetz at openjdk.org Mon Dec 1 08:56:06 2025 From: goetz at openjdk.org (Goetz Lindenmaier) Date: Mon, 1 Dec 2025 08:56:06 GMT Subject: RFR: 8372730: Problem list compiler/arguments/TestCodeEntryAlignment.java on x64 In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 10:33:24 GMT, Matthias Baesken wrote: > [JDK-8372720](https://bugs.openjdk.org/browse/JDK-8372720) problem listed the test compiler/arguments/TestCodeEntryAlignment.java on macOS x64 but the issue appears on other OS running on x64 CPUs (e.g. Linux) too . Marked as reviewed by goetz (Reviewer). LGTM, too. ------------- PR Review: https://git.openjdk.org/jdk/pull/28553#pullrequestreview-3523862225 PR Comment: https://git.openjdk.org/jdk/pull/28553#issuecomment-3595336998 From mbaesken at openjdk.org Mon Dec 1 09:06:23 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 1 Dec 2025 09:06:23 GMT Subject: RFR: 8372730: Problem list compiler/arguments/TestCodeEntryAlignment.java on x64 In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 10:33:24 GMT, Matthias Baesken wrote: > [JDK-8372720](https://bugs.openjdk.org/browse/JDK-8372720) problem listed the test compiler/arguments/TestCodeEntryAlignment.java on macOS x64 but the issue appears on other OS running on x64 CPUs (e.g. Linux) too . Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28553#issuecomment-3595376162 From mbaesken at openjdk.org Mon Dec 1 09:06:24 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 1 Dec 2025 09:06:24 GMT Subject: Integrated: 8372730: Problem list compiler/arguments/TestCodeEntryAlignment.java on x64 In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 10:33:24 GMT, Matthias Baesken wrote: > [JDK-8372720](https://bugs.openjdk.org/browse/JDK-8372720) problem listed the test compiler/arguments/TestCodeEntryAlignment.java on macOS x64 but the issue appears on other OS running on x64 CPUs (e.g. Linux) too . This pull request has now been integrated. Changeset: 5bd7db03 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/5bd7db034aaf8aa6780945e02a7f9a35e16b036e Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8372730: Problem list compiler/arguments/TestCodeEntryAlignment.java on x64 Reviewed-by: lucy, goetz ------------- PR: https://git.openjdk.org/jdk/pull/28553 From shade at openjdk.org Mon Dec 1 09:09:23 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 09:09:23 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v5] In-Reply-To: References: Message-ID: <_o4EmLJi9oI38vVDE69u9u8dC-Ad8501si7GW_bIi9M=.ec2c10fb-40fd-4c8e-95af-ae8f59806da6@github.com> On Fri, 28 Nov 2025 15:21:47 GMT, Andrew Haley wrote: > I'm seeing minor performance regressions in `InterfaceCalls.test2ndInt5Types`, before and after this PR: Reproduced locally too: Benchmark (randomized) Mode Cnt Score Error Units # Baseline InterfaceCalls.test2ndInt5Types false avgt 12 16.945 ? 0.079 ns/op InterfaceCalls.test2ndInt5Types:L1-dcache-load-misses false avgt 3 0.076 ? 2.187 #/op InterfaceCalls.test2ndInt5Types:L1-dcache-loads false avgt 3 88.738 ? 0.416 #/op InterfaceCalls.test2ndInt5Types:branch-misses false avgt 3 0.007 ? 0.003 #/op InterfaceCalls.test2ndInt5Types:branches false avgt 3 49.122 ? 0.353 #/op InterfaceCalls.test2ndInt5Types:cycles false avgt 3 57.147 ? 1.698 #/op InterfaceCalls.test2ndInt5Types:instructions false avgt 3 247.443 ? 1.531 #/op # Current PR InterfaceCalls.test2ndInt5Types false avgt 12 22.513 ? 0.208 ns/op InterfaceCalls.test2ndInt5Types:L1-dcache-load-misses false avgt 3 0.012 ? 0.072 #/op InterfaceCalls.test2ndInt5Types:L1-dcache-loads false avgt 3 108.446 ? 13.975 #/op ; +20 loads InterfaceCalls.test2ndInt5Types:branch-misses false avgt 3 0.407 ? 0.010 #/op InterfaceCalls.test2ndInt5Types:branches false avgt 3 54.102 ? 0.403 #/op ; +5 branches InterfaceCalls.test2ndInt5Types:cycles false avgt 3 75.938 ? 5.043 #/op InterfaceCalls.test2ndInt5Types:instructions false avgt 3 280.194 ? 5.758 #/op ; +32 instructions Looked at perfasm, and there are no gross problems there. I also think reliability trumps this minor performance bump. But I also suspect this is caused by second loop re-walking the table looking for (empty) slots, this is where extra loads are coming from. I believe it can reasonably track the first non-null slot and start the walk from there. Let me see if it is simple to do without complicating the code all too much. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3595393093 From epeter at openjdk.org Mon Dec 1 09:23:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 1 Dec 2025 09:23:46 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v4] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Mon, 1 Dec 2025 07:15:41 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding a testpoint test/hotspot/jtreg/compiler/vectorapi/TestABSMaskedMaxByteVector.java line 48: > 46: @Test > 47: @IR(failOn = {IRNode.ABS_VB}, applyIfAnd={"MaxVectorSize", " <= 8 ", "UseAVX", "0"}) > 48: @IR(counts = {IRNode.ABS_VB, "1"}, applyIf={"MaxVectorSize", " > 8 "}) Are you sure this is going to pass on all platforms? Does this test run ok on `aarch64` where there is no `UseAVX` flag? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28533#discussion_r2576273485 From mdoerr at openjdk.org Mon Dec 1 10:27:36 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 1 Dec 2025 10:27:36 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v7] In-Reply-To: References: Message-ID: > This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. > > The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Minor simplification. - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt - Fix missing whitespace. - Address review comments. - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt - Remove K from AES_Crypt - More minor cleanup. - Improve comment and minor cleanup. - 8371820: Further AES performance improvements for key schedule generation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28299/files - new: https://git.openjdk.org/jdk/pull/28299/files/ae84912d..c7107a70 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=05-06 Stats: 17426 lines in 467 files changed: 10903 ins; 3977 del; 2546 mod Patch: https://git.openjdk.org/jdk/pull/28299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28299/head:pull/28299 PR: https://git.openjdk.org/jdk/pull/28299 From jbhateja at openjdk.org Mon Dec 1 11:48:23 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 11:48:23 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v5] In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Limiting for x86 targets ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28533/files - new: https://git.openjdk.org/jdk/pull/28533/files/be24b1af..a0e008de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From jbhateja at openjdk.org Mon Dec 1 11:48:28 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 11:48:28 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v4] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Mon, 1 Dec 2025 09:20:58 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding a testpoint > > test/hotspot/jtreg/compiler/vectorapi/TestABSMaskedMaxByteVector.java line 48: > >> 46: @Test >> 47: @IR(failOn = {IRNode.ABS_VB}, applyIfAnd={"MaxVectorSize", " <= 8 ", "UseAVX", "0"}) >> 48: @IR(counts = {IRNode.ABS_VB, "1"}, applyIf={"MaxVectorSize", " > 8 "}) > > Are you sure this is going to pass on all platforms? Does this test run ok on `aarch64` where there is no `UseAVX` flag? I will crib on AARCH64, do you think we should make IR framework sensitive to IgnoreUnrecognizedVMOptions ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28533#discussion_r2576742922 From epeter at openjdk.org Mon Dec 1 12:12:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 1 Dec 2025 12:12:47 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v4] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Mon, 1 Dec 2025 11:45:16 GMT, Jatin Bhateja wrote: >> test/hotspot/jtreg/compiler/vectorapi/TestABSMaskedMaxByteVector.java line 48: >> >>> 46: @Test >>> 47: @IR(failOn = {IRNode.ABS_VB}, applyIfAnd={"MaxVectorSize", " <= 8 ", "UseAVX", "0"}) >>> 48: @IR(counts = {IRNode.ABS_VB, "1"}, applyIf={"MaxVectorSize", " > 8 "}) >> >> Are you sure this is going to pass on all platforms? Does this test run ok on `aarch64` where there is no `UseAVX` flag? > > Thanks, fixed > > I intent to pass UseAVX=0 as a run flag to reproduce exact bug scenario, our framework is not sensitive to IgnoreUnrecoginzedVMOptions. You could also just limit the rules to `sse4.1` platforms. Then you can run the tests everywhere, but limit IR rules to what is easy to test for you ;) Platform features get tested before flags, so that helps with platform specific flags ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28533#discussion_r2576821786 From goetz at openjdk.org Mon Dec 1 12:23:52 2025 From: goetz at openjdk.org (Goetz Lindenmaier) Date: Mon, 1 Dec 2025 12:23:52 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v4] In-Reply-To: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> References: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> Message-ID: <7h2_vOOkP-YCjBQ0dIRbNWg3o4gCjy4zwaAE62K0TkE=.c5a07e88-b55b-4be3-9e9d-7d484a663e98@github.com> On Thu, 20 Nov 2025 10:21:34 GMT, Richard Reingruber wrote: >> With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. >> >> It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. >> >> The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. >> >> The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. >> Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. >> >> So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) >> >> There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. >> >> Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. >> >> ##### Testing with fastdebug builds on AARCH64 and PPC64: >> >> hotspot_vector_1 >> hotspot_vector_2 >> jdk_vector >> jdk_vector_sanity >> >> ##### The change passed our CI testing: >> Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: >> >> compiler/vectorapi/VectorRearrangeTest.java >> jdk/incubator/vector/Byte128VectorLoadStoreTests.java >> jdk/incubator/vector/Double256VectorLoadStoreTests.java >> jdk/incubator/vector/Float128VectorTests.java >> jdk/incubator/vector/Long256VectorLoadStoreTests.java >> jdk/incubator/vector/Short128VectorLoadStoreTests.java >> jdk/incubator/vector/Vector64ConversionTests.java > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' > - Exclude IR check on riscv with rvv > - Enhance comment > - Fix OptoAssembly for Power 8 > - PPC: OptoAssembly for vector spilling > - Assert aligned sp offsets in vector spilling > - Delete TMP and !UseNewCode > - Align Matcher::_new_SP for better vector spilling > - TMP: trace unaligned vector spilling > - Add test LGTM OK, so it's not the frame layout aspect of mapping slots to adresses that is adapted by your change, but only the new_sp. Before, the "unusd" part was in the new frame, now it is in the old one or rather completely omitted. The growth of the stack is not altered. So the change has no mem space side effect and thus is not critical to apply to all platforms. Thanks for the clarification! src/hotspot/share/opto/chaitin.hpp line 146: > 144: private: > 145: // Number of registers this live range uses when it colors > 146: uint16_t _num_regs; // byte size of the value divided by slot size which is 4 Is this true for oops, too? Hadn't they been mapped to one slot on both, 32 and 64-bit platforms? ------------- Marked as reviewed by goetz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27969#pullrequestreview-3487238248 PR Comment: https://git.openjdk.org/jdk/pull/27969#issuecomment-3596247610 PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2545571373 From goetz at openjdk.org Mon Dec 1 12:26:55 2025 From: goetz at openjdk.org (Goetz Lindenmaier) Date: Mon, 1 Dec 2025 12:26:55 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v4] In-Reply-To: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> References: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> Message-ID: On Thu, 20 Nov 2025 10:21:34 GMT, Richard Reingruber wrote: >> With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. >> >> It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. >> >> The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. >> >> The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. >> Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. >> >> So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) >> >> There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. >> >> Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. >> >> ##### Testing with fastdebug builds on AARCH64 and PPC64: >> >> hotspot_vector_1 >> hotspot_vector_2 >> jdk_vector >> jdk_vector_sanity >> >> ##### The change passed our CI testing: >> Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: >> >> compiler/vectorapi/VectorRearrangeTest.java >> jdk/incubator/vector/Byte128VectorLoadStoreTests.java >> jdk/incubator/vector/Double256VectorLoadStoreTests.java >> jdk/incubator/vector/Float128VectorTests.java >> jdk/incubator/vector/Long256VectorLoadStoreTests.java >> jdk/incubator/vector/Short128VectorLoadStoreTests.java >> jdk/incubator/vector/Vector64ConversionTests.java > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' > - Exclude IR check on riscv with rvv > - Enhance comment > - Fix OptoAssembly for Power 8 > - PPC: OptoAssembly for vector spilling > - Assert aligned sp offsets in vector spilling > - Delete TMP and !UseNewCode > - Align Matcher::_new_SP for better vector spilling > - TMP: trace unaligned vector spilling > - Add test So it's not the Spill slots that are better aligned, as the title proposes. It's just the offsets to the new_sp that has better alignment und thus can be encoded cheaper. Maybe change the title to "C2: Better Aligment of Vector Spill Slot offsets"? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27969#issuecomment-3596274240 From shade at openjdk.org Mon Dec 1 12:42:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 12:42:52 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 18:10:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > reviews Are we moving forward with this? Still too many failures in local testing without this fix :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3596356835 From shade at openjdk.org Mon Dec 1 13:04:08 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 13:04:08 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v6] In-Reply-To: References: Message-ID: <2ifEaoGuZU4duyckWchgOnnqfH6AgAcrqsiqBZH1Nx4=.1df7af8d-41ac-43a1-90ab-964eb80f155b@github.com> > See the bug for discussion what issues current machinery has. > > This PR executes the plan outlined in the bug: > 1. Common the receiver type profiling code in interpreter and C1 > 2. Rewrite receiver type profiling code to only do atomic receiver slot installations > 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed > > This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: - Simplify third case: no need to loop, just restart the search - Actually have a second "fast" case: receiver is not found in the table, and the table is full - Pushing/popping for rare CAS path is counter-productive ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25305/files - new: https://git.openjdk.org/jdk/pull/25305/files/c441209a..f3e0fa4d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=04-05 Stats: 157 lines in 1 file changed: 85 ins; 52 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/25305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305 PR: https://git.openjdk.org/jdk/pull/25305 From shade at openjdk.org Mon Dec 1 13:04:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 13:04:10 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v5] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 15:55:38 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Tighten up some more > - Offset is always rscratch1, no need to save it > - Grossly simplify register shuffling > - More asserts > - More comment touchups > - Inline code comments > - Mention the updater in ReceiverTypeData > - type_profile -> profile_receiver_type > - Stylistic: remove redundant assert > - ... and 5 more: https://git.openjdk.org/jdk/compare/c028369d...c441209a Oh, all right! This made me realize we actually have a secondary "fast" case: receiver is not found, but profile is full. This is pretty frequent with `TypeProfileWidth=2`. In that case, we are doing way too much stuff, anticipating receiver slot installation that would never actually come. Specializing for that case costs significantly fewer loads, and gets the code much more pipelined; I suspect that because tight loops that _do not_ have CAS-es in them are uop-cached more readily. We now lose "only" 0.5ns in this test: Benchmark (randomized) Mode Cnt Score Error Units # Baseline InterfaceCalls.test2ndInt5Types false avgt 12 16.945 ? 0.079 ns/op InterfaceCalls.test2ndInt5Types:L1-dcache-load-misses false avgt 3 0.076 ? 2.187 #/op InterfaceCalls.test2ndInt5Types:L1-dcache-loads false avgt 3 88.738 ? 0.416 #/op InterfaceCalls.test2ndInt5Types:branch-misses false avgt 3 0.007 ? 0.003 #/op InterfaceCalls.test2ndInt5Types:branches false avgt 3 49.122 ? 0.353 #/op InterfaceCalls.test2ndInt5Types:cycles false avgt 3 57.147 ? 1.698 #/op InterfaceCalls.test2ndInt5Types:instructions false avgt 3 247.443 ? 1.531 #/op # Old PR version InterfaceCalls.test2ndInt5Types false avgt 12 22.513 ? 0.208 ns/op InterfaceCalls.test2ndInt5Types:L1-dcache-load-misses false avgt 3 0.012 ? 0.072 #/op InterfaceCalls.test2ndInt5Types:L1-dcache-loads false avgt 3 108.446 ? 13.975 #/op ; +20 loads InterfaceCalls.test2ndInt5Types:branch-misses false avgt 3 0.407 ? 0.010 #/op InterfaceCalls.test2ndInt5Types:branches false avgt 3 54.102 ? 0.403 #/op ; +5 branches InterfaceCalls.test2ndInt5Types:cycles false avgt 3 75.938 ? 5.043 #/op ; +19 cycles InterfaceCalls.test2ndInt5Types:instructions false avgt 3 280.194 ? 5.758 #/op ; +32 instructions # New PR version InterfaceCalls.test2ndInt5Types false avgt 12 17.441 ? 0.287 ns/op InterfaceCalls.test2ndInt5Types:L1-dcache-load-misses false avgt 3 0.009 ? 0.072 #/op InterfaceCalls.test2ndInt5Types:L1-dcache-loads false avgt 3 88.803 ? 1.401 #/op InterfaceCalls.test2ndInt5Types:branch-misses false avgt 3 0.009 ? 0.062 #/op InterfaceCalls.test2ndInt5Types:branches false avgt 3 52.945 ? 0.752 #/op ; +4 branches InterfaceCalls.test2ndInt5Types:cycles false avgt 3 58.866 ? 15.379 #/op ; +2 cycles InterfaceCalls.test2ndInt5Types:instructions false avgt 3 272.838 ? 1.665 #/op ; +28 instructions The code is in new commits, passes `hotspot:tier1`, running more tests now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3596428656 From jbhateja at openjdk.org Mon Dec 1 13:06:12 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 13:06:12 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v6] In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Review suggestion incorporated ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28533/files - new: https://git.openjdk.org/jdk/pull/28533/files/a0e008de..c84f473e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=04-05 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From jbhateja at openjdk.org Mon Dec 1 13:12:07 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 13:12:07 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v4] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Mon, 1 Dec 2025 12:10:15 GMT, Emanuel Peter wrote: >> Thanks, fixed >> >> I intend to pass UseAVX=0 as a run flag to reproduce exact bug scenario, our framework is not sensitive to IgnoreUnrecoginzedVMOptions. > > You could also just limit the rules to `sse4.1` platforms. Then you can run the tests everywhere, but limit IR rules to what is easy to test for you ;) > > Platform features get tested before flags, so that helps with platform specific flags ;) Thanks!! , that is much better! https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/compiler/lib/ir_framework#disableenable-ir-rules-based-on-platform ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28533#discussion_r2577010701 From jbhateja at openjdk.org Mon Dec 1 13:32:25 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 13:32:25 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v7] In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28533/files - new: https://git.openjdk.org/jdk/pull/28533/files/c84f473e..2f773133 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From jbhateja at openjdk.org Mon Dec 1 13:39:09 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 13:39:09 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v8] In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28533/files - new: https://git.openjdk.org/jdk/pull/28533/files/2f773133..ef84ffa7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From mli at openjdk.org Mon Dec 1 15:13:13 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 1 Dec 2025 15:13:13 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> Message-ID: <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> > Hi, > > This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. > > This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. > > Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. > > # Test > ## Jtreg > > in progress... > > ## Performance > > Column names meanings: > * p: with patch > * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > * m: without patch > * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > > #### Average improvement > > NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. > > For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) > -- | -- | -- | -- > 1.022782609 | 2.198717391 | 2.162673913 | 2.199 > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: - remove log_warning - add test cases: BoolTest::ge/gt in enc_cmove_fp_cmp_fp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28309/files - new: https://git.openjdk.org/jdk/pull/28309/files/46b32186..077dc35c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28309&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28309&range=05-06 Stats: 226 lines in 2 files changed: 214 ins; 2 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/28309.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28309/head:pull/28309 PR: https://git.openjdk.org/jdk/pull/28309 From mli at openjdk.org Mon Dec 1 15:13:15 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 1 Dec 2025 15:13:15 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v6] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <7kh5C9nj7bf6432cG35kDDvV6zhnKEspe8AcYetJ1do=.e1d9ebd3-d80d-4621-8c1e-c77dc721d0df@github.com> Message-ID: On Tue, 25 Nov 2025 09:39:26 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2141: >> >>> 2139: case BoolTest::gt: >>> 2140: cmov_fp_cmp_fp_gt(op1, op2, dst, src, cmp_single, cmov_single); >>> 2141: log_warning(jit)("Float/Double BoolTest::gt path is not tested well, please report the test case!"); >> >> My local tests show this does happen. Try this: >> `$ make test TEST="./test/jdk/javax/sound/midi/Gervill/SoftFilter/TestProcessAudio.java" TEST_VM_OPTS="-XX:-TieredCompilation"` >> >> I think this could be a good reference if you want to add some extra tests for the two cases here. > > Thanks, I'll check it later. Sorry for the delayed response. I've added the test case to cover all the code paths. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2577477492 From liach at openjdk.org Mon Dec 1 15:40:40 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 1 Dec 2025 15:40:40 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v3] In-Reply-To: References: Message-ID: > Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. > > They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. > > We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. > > Paging @minborg who requested Optional folding for review. > > I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Doc tweaks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28540/files - new: https://git.openjdk.org/jdk/pull/28540/files/712dbf1c..7a1cfa4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=01-02 Stats: 25 lines in 1 file changed: 0 ins; 24 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28540.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28540/head:pull/28540 PR: https://git.openjdk.org/jdk/pull/28540 From roland at openjdk.org Mon Dec 1 15:50:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 1 Dec 2025 15:50:41 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations Message-ID: For this failure memory stats are: Total Usage: 1095525816 --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 ctorChaitin 160032 160032 0 0 0 0 0 0 0 0 0 0 0 regAllocSplit 4189544 32728 4156816 0 0 0 0 0 0 0 0 0 0 postAllocCopyRemoval 65456 0 65456 0 0 0 0 0 0 0 0 0 0 fixupSpills 32728 0 32728 0 0 0 0 0 0 0 0 0 0 chaitinCoalesce1 1505808 262144 1243664 0 0 0 0 0 0 0 0 0 0 output 138300376 138300376 0 0 0 0 0 0 0 0 0 0 0 shorten branches 360008 196368 163640 0 0 0 0 0 0 0 0 0 0 The noticeable line is: idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 A lot of memory (almost 1 GB) gets allocated in the `comp` arena during `idealLoop`. So even though the compilation goes over the limit in `Compile::Code_Gen()`, the root cause is what happens earlier, during `idealLoop`. `_loop_or_ctrl` and `_body` are both allocated in the `comp` arena. Accumulated over several loop opts pass, they should not use that much memory but the test is run with `+VerifyLoopOptimizations`: calls to `PhaseIdealLoop::verify()` cause new `PhaseIdealLoop` objects to be allocated and more memory to be used in the `comp` arena. The fix I propose is to allocate `_loop_or_ctrl` and `_body` in a dedicated `ResourceArea` so memory can be reclaimed when a pass of loop opts is over. With that change: Total Usage: 227682272 --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 227682272 --- idealLoop 52278416 0 38687056 6913568 0 392776 0 0 0 0 0 6285016 0 0 that is ~50MB total for `idealLoop` instead of almost 1GB. Total usage peaks around 200MB. ------------- Commit messages: - whitespaces - more - test case - more - clean up - fix Changes: https://git.openjdk.org/jdk/pull/28581/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370519 Stats: 345 lines in 6 files changed: 341 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28581/head:pull/28581 PR: https://git.openjdk.org/jdk/pull/28581 From coleenp at openjdk.org Mon Dec 1 15:57:10 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 1 Dec 2025 15:57:10 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v3] In-Reply-To: References: Message-ID: <379iBIu0uk_Af-5_RZUQBFNkGyFM7iYpe4B_hg93tn8=.95e6e771-31f5-4b89-8172-aa3d0837de25@github.com> On Mon, 1 Dec 2025 15:40:40 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Doc tweaks With one small change, the runtime part of this change looks good. src/hotspot/share/ci/ciField.cpp line 220: > 218: return false; > 219: // Explicit opt-in from system classes > 220: if (holder->trust_final_fields()) This is missing { } so not sure where it ends, especially that it encloses an if statement, and other code. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28540#pullrequestreview-3525748039 PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2577662841 From mhaessig at openjdk.org Mon Dec 1 16:12:36 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 1 Dec 2025 16:12:36 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 15:40:00 GMT, Roland Westrelin wrote: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Thank you for fixing this, @rwestrel. Your fix looks good to me. I merely have two nitpicky suggestions. I will kick off a run of testing and report back with the results. src/hotspot/share/opto/compile.hpp line 374: > 372: // Compilation environment. > 373: Arena _comp_arena; // Arena with lifetime equivalent to Compile > 374: ResourceArea _idealloop_arena; // For data whose lifetime is a pass of loop optimizations Suggestion: ResourceArea _idealloop_arena; // For data whose lifetime is a single pass of loop optimizations ``` Nit: This makes it abundantly clear that the data is freed after one pass. test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java line 27: > 25: * @test > 26: * @bug 8370519 > 27: * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations Unsure, but would this test qualify for `@key stress`? ------------- PR Review: https://git.openjdk.org/jdk/pull/28581#pullrequestreview-3525793585 PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2577699466 PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2577714828 From qamai at openjdk.org Mon Dec 1 16:19:23 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 1 Dec 2025 16:19:23 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 15:40:00 GMT, Roland Westrelin wrote: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... src/hotspot/share/opto/compile.hpp line 810: > 808: // Compilation environment. > 809: Arena* comp_arena() { return &_comp_arena; } > 810: ResourceArea* idealloop_arena() { return &_idealloop_arena; } Should we make it more idiomatic C++ by having the `ResourceArea` allocated and deallocated together with the `PhaseIdealLoop` instead of attaching it to the `Compile` object? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2577746393 From sviswanathan at openjdk.org Mon Dec 1 16:33:43 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 1 Dec 2025 16:33:43 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v10] In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 06:01:26 GMT, Jiangli Zhou wrote: >> Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. >> >> Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Change to break before operators. Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28363#pullrequestreview-3525930778 From bmaillard at openjdk.org Mon Dec 1 16:55:09 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 1 Dec 2025 16:55:09 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 15:40:00 GMT, Roland Westrelin wrote: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Thanks for fixing this @rwestrel, I agree with the fix. I noticed that this could be a problem while working on [JDK-8366990](https://bugs.openjdk.org/browse/JDK-8366990), but there was no reproducer at the time. src/hotspot/share/opto/compile.cpp line 656: > 654: _stress_seed(0), > 655: _comp_arena(mtCompiler, Arena::Tag::tag_comp), > 656: _idealloop_arena(mtCompiler, Arena::Tag::tag_idealloop), To keep the naming consistent with other mentions of `IdealLoop` in variable/field names (such as `_phase_verify_ideal_loop`), I would name this `_ideal_loop_arena`. This will make it easier to find in a code editor. Feel free to ignore if you disagree test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java line 28: > 26: * @bug 8370519 > 27: * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations > 28: * @run main/othervm -XX:CompileCommand=compileonly,*TestVerifyLoopOptimizationsHighMemUsage*::* -XX:-TieredCompilation -Xbatch Out of curiosity, have you try reducing the test with `creduce`? I fixed a similar issue in [JDK-8366990](https://bugs.openjdk.org/browse/JDK-8366990), and initially reviewers were concerned about the long compilation time. I was able to get decent results with `creduce` by using `-XX:CompileCommand=memlimit`. Not sure if it's worth doing here though. ------------- PR Review: https://git.openjdk.org/jdk/pull/28581#pullrequestreview-3525878832 PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2577760668 PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2577804805 From jiangli at openjdk.org Mon Dec 1 17:32:40 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 1 Dec 2025 17:32:40 GMT Subject: Integrated: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes In-Reply-To: References: Message-ID: On Mon, 17 Nov 2025 22:34:14 GMT, Jiangli Zhou wrote: > Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. > > Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! This pull request has now been integrated. Changeset: 6cb1c8f9 Author: Jiangli Zhou URL: https://git.openjdk.org/jdk/commit/6cb1c8f9cfcb797af788ca8fb490f388cc68f525 Stats: 151 lines in 2 files changed: 149 ins; 1 del; 1 mod 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes Co-authored-by: Thomas Holenstein Co-authored-by: Lukas Zobernig Reviewed-by: shade, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/28363 From liach at openjdk.org Mon Dec 1 18:27:34 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 1 Dec 2025 18:27:34 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v4] In-Reply-To: References: Message-ID: > Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. > > They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. > > We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. > > Paging @minborg who requested Optional folding for review. > > I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. Chen Liang has updated the pull request incrementally with one additional commit since the last revision: bracket styles ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28540/files - new: https://git.openjdk.org/jdk/pull/28540/files/7a1cfa4a..d353bdbe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=02-03 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28540.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28540/head:pull/28540 PR: https://git.openjdk.org/jdk/pull/28540 From rrich at openjdk.org Mon Dec 1 18:32:57 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 1 Dec 2025 18:32:57 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v4] In-Reply-To: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> References: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> Message-ID: On Thu, 20 Nov 2025 10:21:34 GMT, Richard Reingruber wrote: >> With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. >> >> It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. >> >> The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. >> >> The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. >> Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. >> >> So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) >> >> There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. >> >> Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. >> >> ##### Testing with fastdebug builds on AARCH64 and PPC64: >> >> hotspot_vector_1 >> hotspot_vector_2 >> jdk_vector >> jdk_vector_sanity >> >> ##### The change passed our CI testing: >> Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: >> >> compiler/vectorapi/VectorRearrangeTest.java >> jdk/incubator/vector/Byte128VectorLoadStoreTests.java >> jdk/incubator/vector/Double256VectorLoadStoreTests.java >> jdk/incubator/vector/Float128VectorTests.java >> jdk/incubator/vector/Long256VectorLoadStoreTests.java >> jdk/incubator/vector/Short128VectorLoadStoreTests.java >> jdk/incubator/vector/Vector64ConversionTests.java > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' > - Exclude IR check on riscv with rvv > - Enhance comment > - Fix OptoAssembly for Power 8 > - PPC: OptoAssembly for vector spilling > - Assert aligned sp offsets in vector spilling > - Delete TMP and !UseNewCode > - Align Matcher::_new_SP for better vector spilling > - TMP: trace unaligned vector spilling > - Add test Thanks for the review, G?tz! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27969#issuecomment-3598210608 From rrich at openjdk.org Mon Dec 1 18:32:59 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 1 Dec 2025 18:32:59 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v4] In-Reply-To: <7h2_vOOkP-YCjBQ0dIRbNWg3o4gCjy4zwaAE62K0TkE=.c5a07e88-b55b-4be3-9e9d-7d484a663e98@github.com> References: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> <7h2_vOOkP-YCjBQ0dIRbNWg3o4gCjy4zwaAE62K0TkE=.c5a07e88-b55b-4be3-9e9d-7d484a663e98@github.com> Message-ID: On Thu, 20 Nov 2025 11:16:52 GMT, Goetz Lindenmaier wrote: > Is this true for oops, too? I think so (see [here](https://github.com/openjdk/jdk/blob/45c0600d3abfa4bcd0338840523c0df69283afe2/src/hotspot/share/opto/chaitin.cpp#L945-L950)). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2578191613 From heidinga at openjdk.org Mon Dec 1 18:55:09 2025 From: heidinga at openjdk.org (Dan Heidinga) Date: Mon, 1 Dec 2025 18:55:09 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v4] In-Reply-To: References: Message-ID: <-9yglNAoD81NuGyLSS0ehpkPZmqK66Qyd7h4UFcztGA=.56a84f5e-a29a-4fc0-b0d7-ce20cac37851@github.com> On Mon, 1 Dec 2025 18:27:34 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > bracket styles A bit of meta-question about this PR and JEP 500: does this trust need to be rescinded if the user explicitly adds `--enable-final-field-mutation=` for the modules that contain these classes marked with the annotation? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3598347264 From vlivanov at openjdk.org Mon Dec 1 19:30:22 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 1 Dec 2025 19:30:22 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: References: Message-ID: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> > Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. > > There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. > > The difference can be illustrated with the following simple cases: > > class A { void m() {} } > class B extends A { void m() {} } > > void testInstanceOf(A obj) { > if (obj instanceof B) { > obj.m(); > } > } > > InstanceOf::testInstanceOf (12 bytes) > @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call > > vs > > void testInstanceOfCast(A obj) { > if (obj instanceof B) { > B b = (B)obj; > b.m(); > } > } > > InstanceOf::testInstanceOfCast (17 bytes) > @ 13 InstanceOf$B::m (1 bytes) inline (hot) > > > Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. > > FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. > > Testing: hs-tier1 - hs-tier5 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Test fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28517/files - new: https://git.openjdk.org/jdk/pull/28517/files/1cf6238f..0a5e78c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=00-01 Stats: 15 lines in 1 file changed: 5 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/28517.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28517/head:pull/28517 PR: https://git.openjdk.org/jdk/pull/28517 From vlivanov at openjdk.org Mon Dec 1 19:44:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 1 Dec 2025 19:44:48 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> References: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> Message-ID: On Mon, 1 Dec 2025 19:30:22 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Test fix Thanks, Roland! I slightly reworked the test to make it more robust. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28517#issuecomment-3598543397 From vlivanov at openjdk.org Mon Dec 1 19:51:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 1 Dec 2025 19:51:48 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 14:08:03 GMT, Quan Anh Mai wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Test fix > > src/hotspot/share/opto/parse2.cpp line 1739: > >> 1737: } >> 1738: >> 1739: // Match an instanceof check. > > We seem to require that the input of `SubTypeCheck` is not `null`. What do you think about allowing `SubTypeCheck` to accept `null` and return `false`? Yes, it's a good idea and the right direction to move. While experimenting with a different enhancement, I noticed that a subtype check leaves a null check behind irrespective of whether the check goes away or not. Unfortunately, there are some engineering considerations which complicates the change. `SubTypeCheck` is shared across all the places where subtype checks are performed, but `checkcast` and `instanceof` differ in the way `null` is handled. So, the proper way to fix it is to introduce a higher-level representation which implicitly handles nulls and then eventually lower it to `SubTypeCheck` and materialize null check if needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2578397399 From liach at openjdk.org Mon Dec 1 20:23:49 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 1 Dec 2025 20:23:49 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v4] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 18:27:34 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > bracket styles This PR currently does not interact with JEP 500. However, as specified in `Field.set`, the result of setting a final field may be ignored, as Alan [commented](https://github.com/openjdk/jdk/pull/28540#discussion_r2573494589). So I don't think we need to rescind the current trusting even if users enable mutations. In addition, @DanHeidinga I made the same fault as you when I first saw `--enable-final-field-mutation=` - this actually represents the callers, instead of the target, of `Field.set`. The target of mutation is specified via `--add-opens`, if the target field is not public. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3598676839 From liach at openjdk.org Mon Dec 1 20:34:31 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 1 Dec 2025 20:34:31 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v2] In-Reply-To: References: Message-ID: > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Logical fallacy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/522cbe9d..886d3918 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From vpaprotski at openjdk.org Mon Dec 1 21:23:14 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 1 Dec 2025 21:23:14 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory Message-ID: Requires a Broadwell machine, but was able to reproduce with an emulator: ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run ------------- Commit messages: - increase compiler code cache size Changes: https://git.openjdk.org/jdk/pull/28588/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28588&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372703 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28588.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28588/head:pull/28588 PR: https://git.openjdk.org/jdk/pull/28588 From dlong at openjdk.org Mon Dec 1 22:26:10 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 1 Dec 2025 22:26:10 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v3] In-Reply-To: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: > The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. > > I played around with other approaches, like: > 1. setting the stack to empty > 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() > 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set > but in the end I decided to go with the minimal localized low-risk change. Dean Long has updated the pull request incrementally with one additional commit since the last revision: add bugid ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28486/files - new: https://git.openjdk.org/jdk/pull/28486/files/8f89b007..5d577099 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28486&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28486&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28486.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28486/head:pull/28486 PR: https://git.openjdk.org/jdk/pull/28486 From dlong at openjdk.org Mon Dec 1 22:26:11 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 1 Dec 2025 22:26:11 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v2] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: <6JooQy0BdhEorBSCfW_R-v_YmFRnQ4N1hmwRxb0ALdU=.2d079520-51c1-447a-ac47-500daef45a68@github.com> On Mon, 1 Dec 2025 07:11:11 GMT, Christian Hagedorn wrote: > Looks good to me, too. > > > always setting the reexecute flag in add_safepoint_edges() if must_throw is set > > but in the end I decided to go with the minimal localized low-risk change. > > Is this something we should follow up on? Yes, I have several enhancements in this area on my list. I'll file a separate RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28486#issuecomment-3599217605 From dlong at openjdk.org Mon Dec 1 22:26:12 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 1 Dec 2025 22:26:12 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v3] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: <5gpqqUssgER1MM5K4nbgGHl5e2Uu5TUjPwZAx9Nsdkc=.fc776158-d1b5-451e-aa88-db41611e4f21@github.com> On Fri, 28 Nov 2025 12:13:39 GMT, Manuel H?ssig wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> add bugid > > test/hotspot/jtreg/compiler/exceptions/TestAccessErrorInCatch.java line 26: > >> 24: /* >> 25: * @test >> 26: * @bug 8367002 > > Suggestion: > > * @bug 8367002 8370766 > > Perhaps we should add this bug to the test, since you modified it. Good idea. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28486#discussion_r2578925529 From dlong at openjdk.org Mon Dec 1 22:30:50 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 1 Dec 2025 22:30:50 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v3] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: On Mon, 1 Dec 2025 22:26:10 GMT, Dean Long wrote: >> The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. >> >> I played around with other approaches, like: >> 1. setting the stack to empty >> 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() >> 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set >> but in the end I decided to go with the minimal localized low-risk change. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > add bugid I added the bugid to the test, so I'll need a quick re-review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28486#issuecomment-3599232577 From dlong at openjdk.org Mon Dec 1 22:30:53 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 1 Dec 2025 22:30:53 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v2] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: On Fri, 28 Nov 2025 12:14:39 GMT, Manuel H?ssig wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> remove extra spaces > > Thank you for fixing this, @dean-long. It looks good to me. @mhaessig and @chhagedorn , thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28486#issuecomment-3599233961 From vlivanov at openjdk.org Mon Dec 1 22:49:45 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 1 Dec 2025 22:49:45 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: <82Ddhg3yXemMeyKmZUCWZIPUVOTkdCbXiOcl8LO_Su0=.47680bc7-526d-4c15-9b84-dd9c7d27728d@github.com> References: <82Ddhg3yXemMeyKmZUCWZIPUVOTkdCbXiOcl8LO_Su0=.47680bc7-526d-4c15-9b84-dd9c7d27728d@github.com> Message-ID: On Thu, 27 Nov 2025 14:56:51 GMT, ExE Boss wrote: >> There are corresponding test cases (`testInstanceOfCondPre` et al) where conditions are embedded. >> >> The idea of `testInstanceOfCondLate` and similar test cases is to check how inlining works when condition improves receiver type during incremental inlining phase. > > What?I?meant was?where the?`instanceof` is?in the?called?method, the `testInstanceOfCondPre` all?have the?`instanceof`?checks as?part of?the?`if`?statement. > > -------------------------------------------------------------------------------- > > Something?like: > > static void testInstanceOfCondDefaultInlinePre(A a, boolean cond) { > if (defaultInlineInstanceOfCondPre(a, cond)) { > a.m(); > } > } > static void testInstanceOfCondDefaultInlinePost(A a, boolean cond) { > if (defaultInlineInstanceOfCondPost(a, cond)) { > a.m(); > } > } > > static void testIsInstanceCondDefaultInlinePre(A a, boolean cond) { > if (defaultInlineIsInstanceCondPre(a, cond)) { > a.m(); > } > } > static void testIsInstanceCondDefaultInlinePost(A a, boolean cond) { > if (defaultInlineIsInstanceCondPost(a, cond)) { > a.m(); > } > } > > > -------------------------------------------------------------------------------- > > I?suggest adding?such a?test because?of real?world?code which?use?different internal?implementation classes but?expose their?public?API as?only a?single common?supertype, like?`java.lang.constant.ClassDesc` and?its?`isPrimitive()`/`isArray()`/`isClassOrInterface()` methods (which?currently don?t do?the?`instanceof`?check, but?they probably?should so?that they?can be?reliably?inlined). The test is intended as a white-box test. It focuses on bytecode shapes which result in different IR representations and exercise different optimizations. From compiler perspective, there's no difference between `if (defaultInlineInstanceOfCond(a)) { ... }` and `if (a instanceof B) {...}` when inlining happens during parsing. Both test cases produce the very same IR after parsing is over. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2578972141 From vlivanov at openjdk.org Mon Dec 1 23:29:03 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 1 Dec 2025 23:29:03 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v6] In-Reply-To: <2ifEaoGuZU4duyckWchgOnnqfH6AgAcrqsiqBZH1Nx4=.1df7af8d-41ac-43a1-90ab-964eb80f155b@github.com> References: <2ifEaoGuZU4duyckWchgOnnqfH6AgAcrqsiqBZH1Nx4=.1df7af8d-41ac-43a1-90ab-964eb80f155b@github.com> Message-ID: On Mon, 1 Dec 2025 13:04:08 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: > > - Simplify third case: no need to loop, just restart the search > - Actually have a second "fast" case: receiver is not found in the table, and the table is full > - Pushing/popping for rare CAS path is counter-productive src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4826: > 4824: // and never duplicate the receivers in the list. > 4825: // > 4826: // It is tempting to combine these cases into a single loop, and claim the first Can you elaborate, please, why it is the case? Is it a result of class unloading or something else? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2579069740 From liach at openjdk.org Mon Dec 1 23:41:04 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 1 Dec 2025 23:41:04 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Tweak VH usage in some classes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/886d3918..7bcdcbf3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=01-02 Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From liach at openjdk.org Mon Dec 1 23:53:46 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 1 Dec 2025 23:53:46 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:41:04 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Tweak VH usage in some classes Since I removed the return type dropping VarHandle bypass, TestGetAndAdd became affected because it can no longer access the x86 assembly. Updated the Java calling convention to fix it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28585#issuecomment-3599477724 From liach at openjdk.org Tue Dec 2 00:16:14 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 00:16:14 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant Message-ID: Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. ------------- Commit messages: - Move around - Constant fold identity hash Changes: https://git.openjdk.org/jdk/pull/28589/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372845 Stats: 88 lines in 4 files changed: 87 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28589/head:pull/28589 PR: https://git.openjdk.org/jdk/pull/28589 From dholmes at openjdk.org Tue Dec 2 00:16:48 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 2 Dec 2025 00:16:48 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 18:10:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > reviews We either need this fix or a backout of whatever caused the problem. The fork is this week and this causes a lot of failures in testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3599537313 From vlivanov at openjdk.org Tue Dec 2 00:25:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 00:25:49 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> On Mon, 1 Dec 2025 23:41:04 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Tweak VH usage in some classes src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2033: > 2031: > 2032: @ForceInline > 2033: MethodHandle adaptedMethodHandle(VarHandle vh) { Can you elaborate, please, how this method is intended to behave? test/hotspot/jtreg/compiler/c2/irTests/TestGetAndAdd.java line 78: > 76: @IR(counts = {IRNode.X86_LOCK_XADDB, "3"}, phase = CompilePhase.FINAL_CODE) > 77: public static void addB() { > 78: var _ = (byte) B.getAndAdd(b2); > Since I removed the return type dropping VarHandle bypass, TestGetAndAdd became affected because it can no longer access the x86 assembly. It has performance implications for user code, doesn't it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579149358 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579150006 From liach at openjdk.org Tue Dec 2 01:12:48 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 01:12:48 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> Message-ID: <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> On Tue, 2 Dec 2025 00:20:21 GMT, Vladimir Ivanov wrote: >> Chen Liang has updated the pull request incrementally with one additional commit since the last revision: >> >> Tweak VH usage in some classes > > src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2033: > >> 2031: >> 2032: @ForceInline >> 2033: MethodHandle adaptedMethodHandle(VarHandle vh) { > > Can you elaborate, please, how this method is intended to behave? When this is compiled, `constant` will become either `1` for constant VH and `2` for non-constant VH. So for constant VH, this becomes a stable read. For a non-constant VH, this becomes `getMethodHandle(mode).asType(...)`, equivalent to before. > test/hotspot/jtreg/compiler/c2/irTests/TestGetAndAdd.java line 78: > >> 76: @IR(counts = {IRNode.X86_LOCK_XADDB, "3"}, phase = CompilePhase.FINAL_CODE) >> 77: public static void addB() { >> 78: var _ = (byte) B.getAndAdd(b2); > >> Since I removed the return type dropping VarHandle bypass, TestGetAndAdd became affected because it can no longer access the x86 assembly. > > It has performance implications for user code, doesn't it? The performance is measured by the existing `org.openjdk.bench.java.lang.invoke.VarHandleExact` benchmark, which originally expects `generic_genericInvocation` to be much slower. Now it instead has a performance on par with the exact invocations. The constant folding ability is verified with the new `VarHandleMismatchedTypeFold` IR test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579218324 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579221253 From vlivanov at openjdk.org Tue Dec 2 01:45:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 01:45:49 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> Message-ID: On Tue, 2 Dec 2025 01:08:19 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2033: >> >>> 2031: >>> 2032: @ForceInline >>> 2033: MethodHandle adaptedMethodHandle(VarHandle vh) { >> >> Can you elaborate, please, how this method is intended to behave? > > When this is compiled, `constant` will become either `1` for constant VH and `2` for non-constant VH. So for constant VH, this becomes a stable read. For a non-constant VH, this becomes `getMethodHandle(mode).asType(...)`, equivalent to before. What's the purpose of `constant == MethodHandleImpl.CONSTANT_YES ` and `constant != MethodHandleImpl.CONSTANT_NO` checks then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579287707 From liach at openjdk.org Tue Dec 2 01:51:47 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 01:51:47 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> Message-ID: <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> On Tue, 2 Dec 2025 01:42:50 GMT, Vladimir Ivanov wrote: >> When this is compiled, `constant` will become either `1` for constant VH and `2` for non-constant VH. So for constant VH, this becomes a stable read. For a non-constant VH, this becomes `getMethodHandle(mode).asType(...)`, equivalent to before. > > What's the purpose of `constant == MethodHandleImpl.CONSTANT_YES ` and `constant != MethodHandleImpl.CONSTANT_NO` checks then? Indeed, I should move the adaptedMh read into `constant == MethodHandleImpl.CONSTANT_YES` block. `constant != MethodHandleImpl.CONSTANT_NO` prevents capturing any further if the VH is known non-constant; we keep this branch in constant case in case the adapted MH is not ready when we know the VH is constant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579302480 From vlivanov at openjdk.org Tue Dec 2 01:51:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 01:51:49 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> Message-ID: On Tue, 2 Dec 2025 01:09:59 GMT, Chen Liang wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestGetAndAdd.java line 78: >> >>> 76: @IR(counts = {IRNode.X86_LOCK_XADDB, "3"}, phase = CompilePhase.FINAL_CODE) >>> 77: public static void addB() { >>> 78: var _ = (byte) B.getAndAdd(b2); >> >>> Since I removed the return type dropping VarHandle bypass, TestGetAndAdd became affected because it can no longer access the x86 assembly. >> >> It has performance implications for user code, doesn't it? > > The performance is measured by the existing `org.openjdk.bench.java.lang.invoke.VarHandleExact` benchmark, which originally expects `generic_genericInvocation` to be much slower. Now it instead has a performance on par with the exact invocations. > > The constant folding ability is verified with the new `VarHandleMismatchedTypeFold` IR test. If I understand the IR test logic correctly, C2 was able to compile `(void) B.getAndAdd(b2)` call down to the desired instruction sequence. Is it still the case after the fix? What happens if you keep `TestGetAndAdd.java ` intact? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579300293 From liach at openjdk.org Tue Dec 2 01:54:46 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 01:54:46 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> Message-ID: On Tue, 2 Dec 2025 01:48:13 GMT, Vladimir Ivanov wrote: >> The performance is measured by the existing `org.openjdk.bench.java.lang.invoke.VarHandleExact` benchmark, which originally expects `generic_genericInvocation` to be much slower. Now it instead has a performance on par with the exact invocations. >> >> The constant folding ability is verified with the new `VarHandleMismatchedTypeFold` IR test. > > If I understand the IR test logic correctly, C2 was able to compile `(void) B.getAndAdd(b2)` call down to the desired instruction sequence. Is it still the case after the fix? What happens if you keep `TestGetAndAdd.java > ` intact? No. The old code worked because it implicitly depended on the backdoor path present in the now removed `GUARD_METHOD_TEMPLATE_V` in `VarHandleGuardMethodGenerator`. If this test is intact, now its IR compiles to doing something in adaptedMethodHandle and calling a MethodHandle. Not sure why it doesn't inline through that MethodHandle. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579310315 From vlivanov at openjdk.org Tue Dec 2 02:02:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 02:02:46 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> Message-ID: On Tue, 2 Dec 2025 01:49:04 GMT, Chen Liang wrote: >> What's the purpose of `constant == MethodHandleImpl.CONSTANT_YES ` and `constant != MethodHandleImpl.CONSTANT_NO` checks then? > > Indeed, I should move the adaptedMh read into `constant == MethodHandleImpl.CONSTANT_YES` block. > > `constant != MethodHandleImpl.CONSTANT_NO` prevents capturing any further if the VH is known non-constant; we keep this branch in constant case in case the adapted MH is not ready when we know the VH is constant. I still have a hard time reasoning about state transitions of the cache. 1) Why do you limit successful cache read (`cache != null`) to constant `vh` case (`constant == MethodHandleImpl.CONSTANT_YES`)? 2) Why do you avoid cache update in non-constant case (`constant != MethodHandleImpl.CONSTANT_NO`)? What happens if it runs compiled `adaptedMethodHandle` method? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579329673 From vlivanov at openjdk.org Tue Dec 2 02:08:44 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 02:08:44 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> Message-ID: <53uGo7JI87pm-cZmvxBiHniURB_bKryyfrWpewgZLP8=.bb97af75-8a6d-4aa8-8a90-e8c4cbc77ec8@github.com> On Tue, 2 Dec 2025 01:52:04 GMT, Chen Liang wrote: >> If I understand the IR test logic correctly, C2 was able to compile `(void) B.getAndAdd(b2)` call down to the desired instruction sequence. Is it still the case after the fix? What happens if you keep `TestGetAndAdd.java >> ` intact? > > No. The old code worked because it implicitly depended on the backdoor path present in the now removed `GUARD_METHOD_TEMPLATE_V` in `VarHandleGuardMethodGenerator`. If this test is intact, now its IR compiles to doing something in adaptedMethodHandle and calling a MethodHandle. Not sure why it doesn't inline through that MethodHandle. Ok, so you eliminated a fast-path check for void-return case and now JIT can't fully optimize it anymore. Do I get it right? Since this particular bytecode shape is exposed through public API, I don't see why user code can't step on it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579341027 From liach at openjdk.org Tue Dec 2 02:16:52 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 02:16:52 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> Message-ID: <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> On Tue, 2 Dec 2025 02:00:08 GMT, Vladimir Ivanov wrote: >> Indeed, I should move the adaptedMh read into `constant == MethodHandleImpl.CONSTANT_YES` block. >> >> `constant != MethodHandleImpl.CONSTANT_NO` prevents capturing any further if the VH is known non-constant; we keep this branch in constant case in case the adapted MH is not ready when we know the VH is constant. > > I still have a hard time reasoning about state transitions of the cache. > > 1) Why do you limit successful cache read (`cache != null`) to constant `vh` case (`constant == MethodHandleImpl.CONSTANT_YES`)? > > 2) Why do you avoid cache update in non-constant case (`constant != MethodHandleImpl.CONSTANT_NO`)? What happens if it runs compiled `adaptedMethodHandle` method? So an `AccessDescriptor` is created for each sigpoly VH site in the source code. Usually it is `VH.operation()`, but it is legal to use a non-constant VarHandle variable and call an operation on that. If `constant == MethodHandleImpl.CONSTANT_NO`, we are sure that we have the non-constant case, so we cannot trust that cached method handle, and there is no point further caching. We can only read that previous MH conversion cache if `constant == MethodHandleImpl.CONSTANT_YES` because this means our cache is always correct. >> No. The old code worked because it implicitly depended on the backdoor path present in the now removed `GUARD_METHOD_TEMPLATE_V` in `VarHandleGuardMethodGenerator`. If this test is intact, now its IR compiles to doing something in adaptedMethodHandle and calling a MethodHandle. Not sure why it doesn't inline through that MethodHandle. > > Ok, so you eliminated a fast-path check for void-return case and now JIT can't fully optimize it anymore. Do I get it right? Since this particular bytecode shape is exposed through public API, I don't see why user code can't step on it. JIT can fully optimize it in JMH benchmarks. I don't know why the IR in this test can't optimize it - I couldn't reproduce this CI failure locally on my linux-x64-debug profile, but this modified test passes on CI. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579352226 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579353722 From vlivanov at openjdk.org Tue Dec 2 02:26:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 02:26:46 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:01:08 GMT, Chen Liang wrote: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. @liach Thanks for taking care of the fix. Here's a more polished version: https://github.com/openjdk/jdk/commit/c6c4e9f23a1bdf801d0cc8e36f343543b8bfccda ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3599884931 From vlivanov at openjdk.org Tue Dec 2 02:32:47 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 02:32:47 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> Message-ID: On Tue, 2 Dec 2025 02:13:15 GMT, Chen Liang wrote: >> I still have a hard time reasoning about state transitions of the cache. >> >> 1) Why do you limit successful cache read (`cache != null`) to constant `vh` case (`constant == MethodHandleImpl.CONSTANT_YES`)? >> >> 2) Why do you avoid cache update in non-constant case (`constant != MethodHandleImpl.CONSTANT_NO`)? What happens if it runs compiled `adaptedMethodHandle` method? > > So an `AccessDescriptor` is created for each sigpoly VH site in the source code. Usually it is `VH.operation()`, but it is legal to use a non-constant VarHandle variable and call an operation on that. If `constant == MethodHandleImpl.CONSTANT_NO`, we are sure that we have the non-constant case, so we cannot trust that cached method handle, and there is no point further caching. We can only read that previous MH conversion cache if `constant == MethodHandleImpl.CONSTANT_YES` because this means our cache is always correct. So, it seems like what you are trying to achieve is a 1-1 mapping from `AccessDescriptor` to `vh` through `adaptedMh`. So, once `cache != null` you can trust that it corresponds to the `vh` instance passed as a constant. But cache pollution can easily break the invariant, so you try to eliminate the pollution by avoiding cache updates when vh is not constant. Do I get it right? >> Ok, so you eliminated a fast-path check for void-return case and now JIT can't fully optimize it anymore. Do I get it right? Since this particular bytecode shape is exposed through public API, I don't see why user code can't step on it. > > JIT can fully optimize it in JMH benchmarks. I don't know why the IR in this test can't optimize it - I couldn't reproduce this CI failure locally on my linux-x64-debug profile, but this modified test passes on CI. I'd say it's a bad sign. Intermittent bugs manifest exactly in such a way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579374286 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579375565 From liach at openjdk.org Tue Dec 2 02:52:50 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 02:52:50 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:01:08 GMT, Chen Liang wrote: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. I have one question: would it be safer for us to move the constant detection after generate_virtual_guard in the `is_virtual` if block? I think it may be possible for users to create a `Object::hashCode` site with a constant receiver that is of a specialized class that overrides `hashCode`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3599934103 From liach at openjdk.org Tue Dec 2 02:54:46 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 02:54:46 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> Message-ID: On Tue, 2 Dec 2025 02:29:28 GMT, Vladimir Ivanov wrote: >> So an `AccessDescriptor` is created for each sigpoly VH site in the source code. Usually it is `VH.operation()`, but it is legal to use a non-constant VarHandle variable and call an operation on that. If `constant == MethodHandleImpl.CONSTANT_NO`, we are sure that we have the non-constant case, so we cannot trust that cached method handle, and there is no point further caching. We can only read that previous MH conversion cache if `constant == MethodHandleImpl.CONSTANT_YES` because this means our cache is always correct. > > So, it seems like what you are trying to achieve is a 1-1 mapping from `AccessDescriptor` to `vh` through `adaptedMh`. So, once `cache != null` you can trust that it corresponds to the `vh` instance passed as a constant. But cache pollution can easily break the invariant, so you try to eliminate the pollution by avoiding cache updates when vh is not constant. Do I get it right? No. The avoidance of cache update simply trims down the generated code by throwing away the meaningless cache update. The access to cache is already safeguarded by `constant == MethodHandleImpl.CONSTANT_YES`. I should have moved `var cache = adaptedMh;` into the if block of `constant == CONSTANT_YES`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579405388 From wenanjian at openjdk.org Tue Dec 2 06:36:53 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Tue, 2 Dec 2025 06:36:53 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: <5HbBb-mjtZWqWTu-HQe7KrRyHG5z-UK4rbVhMzLv4bw=.b1b7e986-dbcf-4ab0-86b4-513f3f1f91ae@github.com> Message-ID: <7J3oLdDF73T7tFgpg2yZvAZGVxcmxskCXw7ugnA5gMs=.e1f55b91-0825-4895-8009-c880c668d4c6@github.com> On Wed, 19 Nov 2025 09:55:47 GMT, Hamlin Li wrote: >>> Some more comments and questions. >> >> Thanks for the careful reviews! I will check the comments and reply one by one later > >> > Some more comments and questions. >> >> Thanks for the careful reviews! I will check the comments and reply one by one later > > > Thanks! Overall looks good, I'll have another by this weekend. Thanks for your patience! @Hamlin-Li Thanks, I have modified some code according to your suggestions and replied to all the comments. Could you please help review it again when you have time? : ) > > > Some more comments and questions. > > > > > > Thanks for the careful reviews! I will check the comments and reply one by one later > > Thanks! Overall looks good, I'll have another by this weekend. Thanks for your patience! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25281#issuecomment-3600415249 From thartmann at openjdk.org Tue Dec 2 06:44:49 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 2 Dec 2025 06:44:49 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 02:49:52 GMT, Chen Liang wrote: > I think it may be possible for users to create a Object::hashCode site with a constant receiver that is of a specialized class that overrides hashCode. Yes, I think so too. We need a test for this scenario. Just an observation: This patch will only allow folding during parsing. I would expect that often, opportunities only arise after other optimizations already took place. For example, something like this would not be optimized if we run with `-XX:+AlwaysIncrementalInline`, right? static final Object a = new Object(); @ForceInline public Object getter(Object obj) { return obj; } public long test() { return getter(a).hashCode(); } Another example: Object val = new Object(); int limit = 2; for (; limit < 4; limit *= 2); for (int i = 2; i < limit; i++) { val = a; } return val.hashCode(); // After loop opts, C2 knows that val == a So ideally, we would move this optimization to IGVN. This would also help Valhalla, where we need to (re-)compute the hashcode for a scalarized value object and would therefore like to fold the computation as aggressively as possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3600438904 From amitkumar at openjdk.org Tue Dec 2 07:10:47 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 2 Dec 2025 07:10:47 GMT Subject: RFR: 8372641: [s390x] Test failure TestMergeStores.java [v3] In-Reply-To: <6iaWuz5X4ol8NmIvbWoQBxmceux35b3529t1sONwCZA=.08c49f3a-87dc-4030-a5a7-1a83f4209fe0@github.com> References: <6iaWuz5X4ol8NmIvbWoQBxmceux35b3529t1sONwCZA=.08c49f3a-87dc-4030-a5a7-1a83f4209fe0@github.com> Message-ID: On Thu, 27 Nov 2025 08:59:09 GMT, Harshit470250 wrote: >> [JDK-8347405](https://bugs.openjdk.org/browse/JDK-8347405) introduced a mergeStores optimisation which requires ReverseBytesS opcode and as it was not implemented for s390 the test case is failing. >> I also implemented ReverseBytesUS. > > Harshit470250 has updated the pull request incrementally with one additional commit since the last revision: > > Added whitespace LGTM, I ran tier1 test and it fixed the testcase without new regression. @RealLucy you want to take a look at this one ? ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/28523#pullrequestreview-3528539709 From chagedorn at openjdk.org Tue Dec 2 07:39:57 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 07:39:57 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 18:10:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > reviews Otherwise, looks good to me. test/hotspot/jtreg/compiler/arraycopy/TestArrayCopyDisjoint.java line 29: > 27: /** > 28: * @test > 29: * @bug 8251871 8285301 You can add the bug number here: Suggestion: * @bug 8251871 8285301 8371964 ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28410#pullrequestreview-3528660084 PR Review Comment: https://git.openjdk.org/jdk/pull/28410#discussion_r2580001963 From epeter at openjdk.org Tue Dec 2 07:43:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 07:43:48 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 18:10:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > reviews Looks good to me, thanks for fixing this @merykitty ! src/hotspot/share/opto/vectornode.cpp line 1177: > 1175: int load_sz = type2aelembytes(mask_bt) * ty->get_con(); > 1176: if (load_sz > MaxVectorSize) { > 1177: // See LoadVectorMaskedNode::Ideal Suggestion: // After loop opts, cast nodes are aggressively removed, if the input is then transformed // into a constant that is outside the range of the removed cast, we may encounter it here. // This should be a dead node then. Optional: Might as well just repeat the explanation. If the code in `LoadVectorMaskedNode::Ideal` changes it is unlikely that we would notice here, and then we'd have a dead link. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28410#pullrequestreview-3528671396 PR Review Comment: https://git.openjdk.org/jdk/pull/28410#discussion_r2580010355 From chagedorn at openjdk.org Tue Dec 2 07:47:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 07:47:50 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v3] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: On Mon, 1 Dec 2025 22:26:10 GMT, Dean Long wrote: >> The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. >> >> I played around with other approaches, like: >> 1. setting the stack to empty >> 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() >> 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set >> but in the end I decided to go with the minimal localized low-risk change. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > add bugid Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28486#pullrequestreview-3528688158 From epeter at openjdk.org Tue Dec 2 07:53:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 07:53:57 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: <0df3H15uO96P1n3zLpKl5y_RKrAgc1h_V91bGB5mCr8=.06942d05-f66d-442f-a754-8135ac0eec30@github.com> Message-ID: On Tue, 25 Nov 2025 17:46:28 GMT, Quan Anh Mai wrote: >> Is this issue at all related to https://github.com/openjdk/jdk/pull/24575? >> >> It seems we remove a `CastLL` from the graph, because the input type is wider than the Cast's type, right? >> >> If I remember correctly from https://github.com/openjdk/jdk/pull/24575, if a CastLL is narrowing, we don't want to remove it, see `ConstraintCastNode::Identity`. >> >> Can you elaborate a bit more on where the `CastLL` came from, and what it is supposed to do? > > @eme64 Yes, it is indeed similar. The issue here is that after loop opts, we try to remove almost all `CastNode`s so that the graph can be GVN-ed better (think of `x = a + b` and `y = cast(a) + b`). > >> Can you elaborate a bit more on where the `CastLL` came from, and what it is supposed to do? > > Macro expansion tries to be smart for an array copy and does this: > > byte[] dst; > byte[] src; > int len; > if (len <= 32) { > int casted_len = cast(len, 0, 32); > vectormask mask = VectorMaskGen(casted_len); > vector v = LoadVectorMasked(src, 0, mask); > StoreVectorMasked(dst, 0, v, mask); > } else { > // do the copy normally; > } > > As you can see, the masked accesses are only meaningful if `len <= 32`. But after loop opts, the cast is gone, leaving us with a len which happens to be larger than `32`. The path should be dead, but IGVN reaches the `LoadVectorMaskedNode` first, which triggers the assert. @merykitty Hold off with integration for a few hours, @chhagedorn just launched some internal testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3600692116 From hgreule at openjdk.org Tue Dec 2 07:54:01 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 2 Dec 2025 07:54:01 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> Message-ID: On Tue, 2 Dec 2025 02:30:28 GMT, Vladimir Ivanov wrote: >> JIT can fully optimize it in JMH benchmarks. I don't know why the IR in this test can't optimize it - I couldn't reproduce this CI failure locally on my linux-x64-debug profile, but this modified test passes on CI. > > I'd say it's a bad sign. Intermittent bugs manifest exactly in such a way. > The performance is measured by the existing `org.openjdk.bench.java.lang.invoke.VarHandleExact` benchmark, which originally expects `generic_genericInvocation` to be much slower. Now it instead has a performance on par with the exact invocations. > > The constant folding ability is verified with the new `VarHandleMismatchedTypeFold` IR test. The benchmark doesn't consider such inexact getAndAdd calls (with a void return type), I think it should cover that too. This is a very common pattern that really must not regress. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2580041626 From chagedorn at openjdk.org Tue Dec 2 08:01:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 08:01:50 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v2] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 15:42:25 GMT, Emanuel Peter wrote: >> **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. >> >> **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. >> >> **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > limit steps of optimize, for Manuel Looks good to me, too! src/hotspot/share/opto/vtransform.cpp line 45: > 43: void VTransformOptimize::worklist_push(VTransformNode* vtn) { > 44: if (_worklist_set.test_set(vtn->_idx)) { return; } > 45: _worklist.push(vtn); I would flip this since it's only one line: Suggestion: if (!_worklist_set.test_set(vtn->_idx)) { _worklist.push(vtn); } test/hotspot/jtreg/compiler/loopopts/superword/TestLongReductionChain.java line 38: > 36: * -XX:CompileCommand=compileonly,${test.main.class}::test > 37: * ${test.main.class} > 38: * @run driver ${test.main.class} Suggestion: * @run main ${test.main.class} ------------- PR Review: https://git.openjdk.org/jdk/pull/28512#pullrequestreview-3528694733 PR Review Comment: https://git.openjdk.org/jdk/pull/28512#discussion_r2580027890 PR Review Comment: https://git.openjdk.org/jdk/pull/28512#discussion_r2580040060 From qamai at openjdk.org Tue Dec 2 08:09:30 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 08:09:30 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v4] In-Reply-To: References: Message-ID: <-SdGoKVu9WpxzbLyqrLt7duH-qK_Bbm6ErrWdDfxJUg=.95c14f7f-b940-4dc6-a63d-055419625a36@github.com> > Hi, > > This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. > > To be more specific, for this issue, we have the graph that looks like: > > ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen > > with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: > > ConI -> ConvI2L -> VectorMaskGen > > After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: bug number in test, comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28410/files - new: https://git.openjdk.org/jdk/pull/28410/files/ec7298ef..c462f0ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28410&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28410&range=02-03 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28410/head:pull/28410 PR: https://git.openjdk.org/jdk/pull/28410 From qamai at openjdk.org Tue Dec 2 08:09:31 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 08:09:31 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 18:10:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > reviews Thanks a lot for your reviews, please reapprove when the tests pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3600742616 From epeter at openjdk.org Tue Dec 2 08:13:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 08:13:47 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v3] In-Reply-To: References: Message-ID: <0ZncRodHJbWfRFLrUCqQn5JPHilDPQA8e7dcrwARsOI=.7fe44906-eeed-49bb-8472-5a264391468b@github.com> > **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. > > **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. > > **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes - Apply suggestions from code review Co-authored-by: Christian Hagedorn - limit steps of optimize, for Manuel - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes - rm old documentation - git move to new test - streamline - refactor and verify - unique worklist - wip solution - ... and 1 more: https://git.openjdk.org/jdk/compare/a4a5fdb0...54881ff4 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28512/files - new: https://git.openjdk.org/jdk/pull/28512/files/9f5bf837..54881ff4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28512&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28512&range=01-02 Stats: 12411 lines in 291 files changed: 6379 ins; 5081 del; 951 mod Patch: https://git.openjdk.org/jdk/pull/28512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28512/head:pull/28512 PR: https://git.openjdk.org/jdk/pull/28512 From epeter at openjdk.org Tue Dec 2 08:13:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 08:13:49 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 07:59:25 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> limit steps of optimize, for Manuel > > Looks good to me, too! @chhagedorn Thanks for having a look. I applied the changes! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28512#issuecomment-3600744825 From mhaessig at openjdk.org Tue Dec 2 08:46:54 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 2 Dec 2025 08:46:54 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v3] In-Reply-To: <0ZncRodHJbWfRFLrUCqQn5JPHilDPQA8e7dcrwARsOI=.7fe44906-eeed-49bb-8472-5a264391468b@github.com> References: <0ZncRodHJbWfRFLrUCqQn5JPHilDPQA8e7dcrwARsOI=.7fe44906-eeed-49bb-8472-5a264391468b@github.com> Message-ID: On Tue, 2 Dec 2025 08:13:47 GMT, Emanuel Peter wrote: >> **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. >> >> **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. >> >> **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - limit steps of optimize, for Manuel > - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes > - rm old documentation > - git move to new test > - streamline > - refactor and verify > - unique worklist > - wip solution > - ... and 1 more: https://git.openjdk.org/jdk/compare/69927660...54881ff4 Marked as reviewed by mhaessig (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28512#pullrequestreview-3528909043 From shade at openjdk.org Tue Dec 2 08:47:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 08:47:02 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v6] In-Reply-To: References: <2ifEaoGuZU4duyckWchgOnnqfH6AgAcrqsiqBZH1Nx4=.1df7af8d-41ac-43a1-90ab-964eb80f155b@github.com> Message-ID: On Mon, 1 Dec 2025 23:25:42 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: >> >> - Simplify third case: no need to loop, just restart the search >> - Actually have a second "fast" case: receiver is not found in the table, and the table is full >> - Pushing/popping for rare CAS path is counter-productive > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4826: > >> 4824: // and never duplicate the receivers in the list. >> 4825: // >> 4826: // It is tempting to combine these cases into a single loop, and claim the first > > Can you elaborate, please, why it is the case? Is it a result of class unloading or something else? Yes, we are clearing MDOs for unloaded classes. I initially thought this kind of cleanup happens only during `ciReceiverTypeData::translate_receiver_data_from` translation to `ciReceiverTypeData`. If that was the only path, we would probably not care about this; although I would, for defensive programming reasons. *But* it looks like the cleanup happens during "normal" GC class unloading, which also makes sense: you do not want to have unloaded classes referenced from any runtime datastructure, including MDO. The path I saw was: ReceiverTypeData::clear_row ReceiverTypeData::clean_weak_klass_links MethodData::clean_method_data InstanceKlass::clean_method_data InstanceKlass::clean_weak_instanceklass_links Klass::clean_weak_instanceklass_links KlassCleaningTask::work ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2580205471 From mhaessig at openjdk.org Tue Dec 2 08:48:54 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 2 Dec 2025 08:48:54 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v3] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: On Mon, 1 Dec 2025 22:26:10 GMT, Dean Long wrote: >> The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. >> >> I played around with other approaches, like: >> 1. setting the stack to empty >> 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() >> 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set >> but in the end I decided to go with the minimal localized low-risk change. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > add bugid Thank you for addressing my comments and the credit. Looks good. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28486#pullrequestreview-3528915361 From mhaessig at openjdk.org Tue Dec 2 08:52:45 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 2 Dec 2025 08:52:45 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 15:40:00 GMT, Roland Westrelin wrote: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Fwiw, testing passed up to tier3 on linux-x64, linux-aarch64, macosx-aarch64, mac-x64, windows-x64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28581#issuecomment-3600902694 From mhaessig at openjdk.org Tue Dec 2 08:58:48 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 2 Dec 2025 08:58:48 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 21:16:18 GMT, Volodymyr Paprotski wrote: > Requires a Broadwell machine, but was able to reproduce with an emulator: > > > ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi > ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run Thank you for fixing this, @vpaprotsk. Please also remove the problem listing of the `compiler/arguments/TestCodeEntryAlignment.java`: https://github.com/openjdk/jdk/blob/84ffe87260753973835ea6b88443e28bcaf0122f/test/hotspot/jtreg/ProblemList.txt#L82 Meanwhile, I will run testing on our side and report back with the results. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28588#pullrequestreview-3528955992 From pminborg at openjdk.org Tue Dec 2 09:04:53 2025 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 2 Dec 2025 09:04:53 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:41:04 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Tweak VH usage in some classes src/hotspot/share/opto/library_call.cpp line 8926: > 8924: bool LibraryCallKit::inline_isCompileConstant() { > 8925: Node* n = argument(0); > 8926: set_result(n->is_Con() ? intcon(1) : intcon(2)); Can we get constants for these magic numbers on the C side as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2580266341 From roland at openjdk.org Tue Dec 2 09:09:16 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:09:16 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: References: Message-ID: <2fn6Bj9HrMiWa4K01CuW-vnL7XKWwkLcTapeVkHqWUo=.9d0e24a5-dc6c-4abe-ae45-10b93479071e@github.com> > Crash occurs because a `MergeMem` node references itself: > > > 608 MergeMem === _ 1 608 1 1 1 1 1 1 1 1 1 1 878 [[ 877 878 608 420 597 ]] { - - - - - - - - - - N878:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !orig=[524] !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > ``` > > Before IGVN, that part of the stream is: > > > 522 Region === 522 604 521 [[ 522 538 523 524 525 526 527 528 529 530 531 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > 524 Phi === 522 608 464 [[ 588 581 564 546 564 559 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > > 538 If === 522 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 553 547 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 540 IfFalse === 538 [[ 548 546 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 553 If === 539 535 [[ 554 555 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 554 IfTrue === 553 [[ 562 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 555 IfFalse === 553 [[ 548 559 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > > 548 Region === 548 _ 540 555 [[ 548 562 561 563 564 565 566 567 568 569 570 571 572 573 574 575 576 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:88 (line 60) > 564 Phi === 548 _ 524 524 [[ 581 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:85 (line 61) > > 562 Region === 562 548 554 [[ 562 600 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 596 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > 581 Phi === 562 564 524 [[ 420 597 610 608 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > > 608 MergeMem === _ 1 581 1 1 1 1 1 1 1 1 1 1 588 [[ 524 ]] { - - - - - - - - - - N588:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > > > 522 is a loop head, 604 is the backedge. The loop becomes unreachable > during IGVN. The loop body above is transformed to: > > > 538 If === 604 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 562 547 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (l... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - review - Merge branch 'master' into JDK-8371464 - test - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28554/files - new: https://git.openjdk.org/jdk/pull/28554/files/052b7a46..825c9dd5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28554&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28554&range=00-01 Stats: 19717 lines in 576 files changed: 12800 ins; 3715 del; 3202 mod Patch: https://git.openjdk.org/jdk/pull/28554.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28554/head:pull/28554 PR: https://git.openjdk.org/jdk/pull/28554 From roland at openjdk.org Tue Dec 2 09:09:18 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:09:18 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: <5g0wstbmWC5gv_OG3sTv5Lb0eYCR4Cq3zQb1PJiWA6w=.efbee9c2-a9db-428b-8aa7-1c3d198d05e9@github.com> References: <5g0wstbmWC5gv_OG3sTv5Lb0eYCR4Cq3zQb1PJiWA6w=.efbee9c2-a9db-428b-8aa7-1c3d198d05e9@github.com> Message-ID: On Mon, 1 Dec 2025 07:57:06 GMT, Aleksey Shipilev wrote: > GHA failures in [com/sun/crypto/provider/Cipher/HPKE/KAT9180](https://github.com/rwestrel/jdk/actions/runs/19761317022#user-content-com_sun_crypto_provider_cipher_hpke_kat9180) would disappear if you merge from master. Actually, this might mean the PR base is quite old, and there might be other bugs on the intersection with this one. Merge from master and pass the GHA, maybe? I merged with latest ------------- PR Comment: https://git.openjdk.org/jdk/pull/28554#issuecomment-3600960580 From roland at openjdk.org Tue Dec 2 09:09:19 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:09:19 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: References: Message-ID: <3t5HBU3tkgGioH6r3THy2oBLYGZ1JzOOWBKM-8lEeuc=.749bada1-f8c3-4872-8f64-21abfc4b5707@github.com> On Mon, 1 Dec 2025 08:02:47 GMT, Damon Fenacci wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8371464 >> - test >> - fix > > src/hotspot/share/opto/cfgnode.cpp line 1404: > >> 1402: Node* other_phi_input = in(j); >> 1403: if (other_phi_input != nullptr && other_phi_input == merge_mem->base_memory() && !is_data_loop(region, phi_input, igvn)) { >> 1404: // merge_mem is a successor memory to other_phi_input, and is not pinned inside the diamond, so push it out. > > Do you think it might be worth adding an additional reason for `!is_data_loop` in the comment? I added a comment in the new commit. Can you have a look? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28554#discussion_r2580274083 From roland at openjdk.org Tue Dec 2 09:13:38 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:13:38 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v7] In-Reply-To: References: Message-ID: > In test cases, `mh` is initially not constant so the method handle > invoke can't be inlined. It is later found to be constant, so it can > be turned into a direct call by > `Compile::process_late_inline_calls_no_inline()`. In the meantime, the > `CallNode` for the mh invoke is cloned (by loop switching). In the > process, only a shallow copy of the `JVMState` for the call is > made. The initial `CallNode` is the first to be processed by > `Compile::process_late_inline_calls_no_inline()` and that causes that > `CallNode` to become dead. The cloned `CallNode` is then > processed. The `JVMState` for that one references the initial > `CallNode` in its caller's `JVMState`. Because that node is dead, that > causes a crash. The fix I propose is to make a deep copy of the > `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is > assigned to the node. > > The other failure I see with these tests is: > > > # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 > # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! > > > because even though the `CallNode` is cloned, there's still only one > late inline recorded. The fix here is to increment > `_number_of_mh_late_inlines` when the node is cloned. > > This was reported by the netty developers. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'master' into JDK-8370939 - Merge branch 'master' into JDK-8370939 - review - Merge branch 'master' into JDK-8370939 - review - more - more - more - more - test - ... and 1 more: https://git.openjdk.org/jdk/compare/1b191400...64b11e6e ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28088/files - new: https://git.openjdk.org/jdk/pull/28088/files/bf46ba3e..64b11e6e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=05-06 Stats: 19716 lines in 575 files changed: 12799 ins; 3715 del; 3202 mod Patch: https://git.openjdk.org/jdk/pull/28088.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28088/head:pull/28088 PR: https://git.openjdk.org/jdk/pull/28088 From roland at openjdk.org Tue Dec 2 09:13:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:13:41 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v6] In-Reply-To: <7nY7QRkkFjOtOuBXID1I4GluA0vnFRLy_UnRICfVkR4=.99ec7fe1-af27-4ab7-ac63-27aa12bec4ef@github.com> References: <7nY7QRkkFjOtOuBXID1I4GluA0vnFRLy_UnRICfVkR4=.99ec7fe1-af27-4ab7-ac63-27aa12bec4ef@github.com> Message-ID: On Fri, 21 Nov 2025 11:33:42 GMT, Roland Westrelin wrote: >> In test cases, `mh` is initially not constant so the method handle >> invoke can't be inlined. It is later found to be constant, so it can >> be turned into a direct call by >> `Compile::process_late_inline_calls_no_inline()`. In the meantime, the >> `CallNode` for the mh invoke is cloned (by loop switching). In the >> process, only a shallow copy of the `JVMState` for the call is >> made. The initial `CallNode` is the first to be processed by >> `Compile::process_late_inline_calls_no_inline()` and that causes that >> `CallNode` to become dead. The cloned `CallNode` is then >> processed. The `JVMState` for that one references the initial >> `CallNode` in its caller's `JVMState`. Because that node is dead, that >> causes a crash. The fix I propose is to make a deep copy of the >> `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is >> assigned to the node. >> >> The other failure I see with these tests is: >> >> >> # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 >> # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! >> >> >> because even though the `CallNode` is cloned, there's still only one >> late inline recorded. The fix here is to increment >> `_number_of_mh_late_inlines` when the node is cloned. >> >> This was reported by the netty developers. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370939 > - review > - Merge branch 'master' into JDK-8370939 > - review > - more > - more > - more > - more > - test > - fix Anyone for another review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28088#issuecomment-3600983968 From roland at openjdk.org Tue Dec 2 09:20:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:20:41 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge branch 'master' into JDK-8354282 - whitespace - review - review - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java Co-authored-by: Christian Hagedorn - review - review - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 ------------- Changes: https://git.openjdk.org/jdk/pull/24575/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=07 Stats: 365 lines in 13 files changed: 264 ins; 27 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From pminborg at openjdk.org Tue Dec 2 09:31:03 2025 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 2 Dec 2025 09:31:03 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:41:04 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Tweak VH usage in some classes src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2036: > 2034: var constant = MethodHandleImpl.isCompileConstant(vh); > 2035: var cache = adaptedMh; > 2036: if (constant == MethodHandleImpl.CONSTANT_YES && cache != null) { Rookie question: Is there multi-thread considerations here? How about visibility across threads? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2580353068 From shade at openjdk.org Tue Dec 2 09:43:17 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 09:43:17 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v7] In-Reply-To: References: Message-ID: > See the bug for discussion what issues current machinery has. > > This PR executes the plan outlined in the bug: > 1. Common the receiver type profiling code in interpreter and C1 > 2. Rewrite receiver type profiling code to only do atomic receiver slot installations > 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed > > This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - More comments - Tighten up the comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25305/files - new: https://git.openjdk.org/jdk/pull/25305/files/f3e0fa4d..39cc4dfe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=05-06 Stats: 13 lines in 1 file changed: 2 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/25305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305 PR: https://git.openjdk.org/jdk/pull/25305 From shade at openjdk.org Tue Dec 2 09:43:19 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 09:43:19 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v6] In-Reply-To: References: <2ifEaoGuZU4duyckWchgOnnqfH6AgAcrqsiqBZH1Nx4=.1df7af8d-41ac-43a1-90ab-964eb80f155b@github.com> Message-ID: On Tue, 2 Dec 2025 08:44:21 GMT, Aleksey Shipilev wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4826: >> >>> 4824: // and never duplicate the receivers in the list. >>> 4825: // >>> 4826: // It is tempting to combine these cases into a single loop, and claim the first >> >> Can you elaborate, please, why it is the case? Is it a result of class unloading or something else? > > Yes, we are clearing MDOs for unloaded classes. > > I initially thought this kind of cleanup happens only during `ciReceiverTypeData::translate_receiver_data_from` translation to `ciReceiverTypeData`. If that was the only path, we would probably not care about this; although I would, for defensive programming reasons. *But* it looks like the cleanup happens during "normal" GC class unloading, which also makes sense: you do not want to have unloaded classes referenced from any runtime datastructure, including MDO. So this forces our hand to deal with empty slots. Old code also did this, AFAICS: it scanned everything at least once. > > The path to receiver cleanup I saw in the code was: > > > ReceiverTypeData::clear_row > ReceiverTypeData::clean_weak_klass_links > MethodData::clean_method_data > InstanceKlass::clean_method_data > InstanceKlass::clean_weak_instanceklass_links > Klass::clean_weak_instanceklass_links > KlassCleaningTask::work > I tightened up the comments a bit to mention that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2580384110 From roland at openjdk.org Tue Dec 2 09:49:29 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:49:29 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v4] In-Reply-To: References: Message-ID: > The test case has an out of loop `Store` with an `AddP` address > expression that has other uses and is in the loop body. Schematically, > only showing the address subgraph and the bases for the `AddP`s: > > > Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 > -> CastPP#110 > > > Both `AddP`s have the same base, a `CastPP` that's also in the loop > body. > > That loop is a counted loop and only has 3 iterations so is fully > unrolled. First, one iteration is peeled: > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > The `AddP`s and `CastPP` are cloned (because in the loop body). As > part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is > called. It finds the test that guards `CastPP#283` in the peeled > iteration dominates and replaces the test that guards `CastPP#110` > (the test in the peeled iteration is the clone of the test in the > loop). That causes `CastPP#110`'s control to be updated to that of the > test in the peeled iteration and to be yanked from the loop. So now > `CastPP#283` and `CastPP#110` have the same inputs. > > Next unrolling happens: > > > /-> CastPP#110 > /-> AddP#400 -> AddP#401 -> CastPP#110 > Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 > \ -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > `AddP`s are cloned once more but not the `CastPP`s because they are > both in the peeled iteration now. A new `Phi` is added. > > Next igvn runs. It's going to push the `AddP`s through the `Phi`s. > > Through `Phi#477`: > > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 > \ -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > Through `Phi#360`: > > > /-> AddP#134 -> CastPP#110 > /-> Phi#509 -> AddP#401 -> CastPP#110 > Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 > -> Phi#514 -> CastPP#283 > -> CastP#110 > > > Then `Phi#514` which has 2 `CastPP`s as input with identical inputs is > transformed into anot... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - more - review - Merge branch 'master' into JDK-8351889 - exp - Merge branch 'master' into JDK-8351889 - verif - Merge branch 'master' into JDK-8351889 - test seed - more - Merge branch 'master' into JDK-8351889 - ... and 4 more: https://git.openjdk.org/jdk/compare/0419511c...15c17bb1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25386/files - new: https://git.openjdk.org/jdk/pull/25386/files/d52f2ded..15c17bb1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=02-03 Stats: 19726 lines in 577 files changed: 12800 ins; 3715 del; 3211 mod Patch: https://git.openjdk.org/jdk/pull/25386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25386/head:pull/25386 PR: https://git.openjdk.org/jdk/pull/25386 From roland at openjdk.org Tue Dec 2 09:49:37 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:49:37 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 11:53:16 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8351889 >> - verif >> - Merge branch 'master' into JDK-8351889 >> - test seed >> - more >> - Merge branch 'master' into JDK-8351889 >> - Merge branch 'master' into JDK-8351889 >> - more >> - test >> - fix > > src/hotspot/share/opto/phaseX.cpp line 2085: > >> 2083: } >> 2084: return false; >> 2085: } > > Why not call it `verify_node_invariants_for`? > > You should also assert immediately. @benoitmaillard Is about to make that change for everything: https://github.com/openjdk/jdk/pull/28295 That one is not integrated. Shouldn't I do that change only if it/when integrates? > src/hotspot/share/opto/phaseX.hpp line 623: > >> 621: // '-XX:VerifyIterativeGVN=10000' >> 622: return ((VerifyIterativeGVN % 100000) / 10000) == 1; >> 623: } > > You will need to add extra documentation to the flag. And also there is a test that uses the flag. You should adjust it to enable this bit as well. Done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2580416861 PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2580411292 From roland at openjdk.org Tue Dec 2 09:58:18 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:58:18 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v2] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 09:46:31 GMT, Emanuel Peter wrote: > > The duplication comes from loop body cloning so I'm not sure how we could prevent the duplication. We could try to common the CastPP nodes once PhaseIdealLoop::peeled_dom_test_elim() is called. > > Right, that could be an option. Do you think that is worth it? `IfNode::Ideal` looks for a dominating `If` that can replace the current `If`. It's not clear to me that that transformation can't trigger a similar failure which is why I think a fix during igvn is more robust. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3601177439 From roland at openjdk.org Tue Dec 2 10:06:57 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 10:06:57 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v2] In-Reply-To: References: Message-ID: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/compile.hpp Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28581/files - new: https://git.openjdk.org/jdk/pull/28581/files/27524015..eb7bd9ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28581/head:pull/28581 PR: https://git.openjdk.org/jdk/pull/28581 From roland at openjdk.org Tue Dec 2 10:06:59 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 10:06:59 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v2] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 16:07:47 GMT, Manuel H?ssig wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/compile.hpp >> >> Co-authored-by: Manuel H?ssig > > test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java line 27: > >> 25: * @test >> 26: * @bug 8370519 >> 27: * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations > > Unsure, but would this test qualify for `@key stress`? I'm not sure either what does. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2580480945 From vklang at openjdk.org Tue Dec 2 10:08:39 2025 From: vklang at openjdk.org (Viktor Klang) Date: Tue, 2 Dec 2025 10:08:39 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: <8tnePBnWu5w86zsXUOVMd7R_oHsTVl_Gjug0QP7N_vw=.5ce0106a-a7bf-40a2-b6a4-76e5d816150e@github.com> On Mon, 1 Dec 2025 23:41:04 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Tweak VH usage in some classes src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 632: > 630: @Hidden > 631: @jdk.internal.vm.annotation.IntrinsicCandidate > 632: static int isCompileConstant(Object obj) { nit: an "is"-question tends to indicate a yes/no answer, but in this case it is more of a compileConstantStatus. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2580489368 From dfenacci at openjdk.org Tue Dec 2 10:18:50 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 2 Dec 2025 10:18:50 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: <3t5HBU3tkgGioH6r3THy2oBLYGZ1JzOOWBKM-8lEeuc=.749bada1-f8c3-4872-8f64-21abfc4b5707@github.com> References: <3t5HBU3tkgGioH6r3THy2oBLYGZ1JzOOWBKM-8lEeuc=.749bada1-f8c3-4872-8f64-21abfc4b5707@github.com> Message-ID: On Tue, 2 Dec 2025 09:04:52 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/cfgnode.cpp line 1404: >> >>> 1402: Node* other_phi_input = in(j); >>> 1403: if (other_phi_input != nullptr && other_phi_input == merge_mem->base_memory() && !is_data_loop(region, phi_input, igvn)) { >>> 1404: // merge_mem is a successor memory to other_phi_input, and is not pinned inside the diamond, so push it out. >> >> Do you think it might be worth adding an additional reason for `!is_data_loop` in the comment? > > I added a comment in the new commit. Can you have a look? ? Thank you Roland. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28554#discussion_r2580530978 From krk at openjdk.org Tue Dec 2 10:22:48 2025 From: krk at openjdk.org (Kerem Kat) Date: Tue, 2 Dec 2025 10:22:48 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v4] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 19:14:05 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: > > fix rename `gc/TestAllocHumongousFragment_generational` failed, seems unrelated: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x000055fe3a2ce64e, pid=7345, tid=7349 # # JRE version: OpenJDK Runtime Environment (26.0) (build 26-internal-krk-a0f0ecb951a83c5069995130cfd803ad9165295f) # Java VM: OpenJDK 64-Bit Server VM (26-internal-krk-a0f0ecb951a83c5069995130cfd803ad9165295f, mixed mode, static, sharing, tiered, compressed oops, compressed class ptrs, shenandoah gc, linux-amd64) # Problematic frame: # V [java+0x149b64e] void ShenandoahMark::mark_loop_work, (ShenandoahGenerationType)1, false, (StringDedupMode)0>(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)1>*, unsigned short*, unsigned int, TaskTerminator*, StringDedup::Requests*) [clone .isra.0]+0x25e ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3601288584 From mhaessig at openjdk.org Tue Dec 2 10:29:29 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 2 Dec 2025 10:29:29 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:03:52 GMT, Roland Westrelin wrote: >> test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java line 27: >> >>> 25: * @test >>> 26: * @bug 8370519 >>> 27: * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations >> >> Unsure, but would this test qualify for `@key stress`? > > I'm not sure either what does. It is a marker to filter resource intensive tests. https://github.com/openjdk/jdk/blob/7278d2e8e5835f090672f7625d391a1b4c1a6626/test/hotspot/jtreg/TEST.ROOT#L29-L30 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2580570634 From shade at openjdk.org Tue Dec 2 10:31:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 10:31:22 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: > See the bug for discussion what issues current machinery has. > > This PR executes the plan outlined in the bug: > 1. Common the receiver type profiling code in interpreter and C1 > 2. Rewrite receiver type profiling code to only do atomic receiver slot installations > 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed > > This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - More comments - Tighten up the comments - Simplify third case: no need to loop, just restart the search - Actually have a second "fast" case: receiver is not found in the table, and the table is full - Pushing/popping for rare CAS path is counter-productive - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - Tighten up some more - Offset is always rscratch1, no need to save it - Grossly simplify register shuffling - ... and 11 more: https://git.openjdk.org/jdk/compare/7278d2e8...3c5019d9 ------------- Changes: https://git.openjdk.org/jdk/pull/25305/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=07 Stats: 418 lines in 8 files changed: 202 ins; 197 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/25305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305 PR: https://git.openjdk.org/jdk/pull/25305 From qamai at openjdk.org Tue Dec 2 10:36:42 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 10:36:42 GMT Subject: RFR: 8350208: CTW: GraphKit::add_safepoint_edges asserts "not enough operands for reexecution" Message-ID: Hi, This PR fixes the issue of the compiler crashing with "not enough operands for reexecution". The issue here is that during `Parse::catch_inline_exceptions`, the old stack is gone, and we cannot reexecute the current bytecode anymore. However, there are some places where we try to insert safepoints into the graph, such as if the handler is a backward jump, or if one of the exceptions in the handlers is not loaded. Since the `_reexecute` state of the current jvms is "undefined", it is inferred automatically that it should reexecute for some bytecodes such as `putfield`. The solution then is to explicitly set `_reexecute` to false. I can manage to write a unit test for the case of a backward handler, for the other cases, since the exceptions that can be thrown for a bytecode that is inferred to reexecute are `NullPointerException`, `ArrayIndexOutOfBoundsException`, and `ArrayStoreException`. I find it hard to construct such a test in which one of them is not loaded. Please kindly review, thanks a lot. ------------- Commit messages: - Set jvms()->_reexecute to false during Parse::catch_inline_exceptions Changes: https://git.openjdk.org/jdk/pull/28597/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28597&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350208 Stats: 153 lines in 3 files changed: 152 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28597.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28597/head:pull/28597 PR: https://git.openjdk.org/jdk/pull/28597 From shade at openjdk.org Tue Dec 2 10:51:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 10:51:01 GMT Subject: RFR: 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 Message-ID: I just noticed (while looking at [JDK-8372800](https://bugs.openjdk.org/browse/JDK-8372800)) that I made a little error in [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) refactor, which made GetAndSet-acquire instruction cost twice as high. The usual cost for acquire versions are twice as low, likely to be selected instead of non-acquire versions. This bug happened as I "simplified" stencils at some point by dropping some arguments and renumbering the remaining ones. This is one place where I apparently forgot to renumber one usage. See other checks for `ifelse($3,Acq,...` in that stencil, all of them are `$3` (correct), not `$4` (incorrect). Seen no real bugs because of this mishap, but it would be good to fix it in case we see issues later. I also looked at stencils again, and I think there are no other argument-index problems like this anywhere else. The real change is in `aarch64_atomic_ad.m4`, `.ad` is re-generated from that stencil. Additional testing: - [ ] Linux AArch64 server fastdebug, `all` - [ ] Linux AArch64 server fastdebug, quick jcstress run ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/28598/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28598&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372862 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28598.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28598/head:pull/28598 PR: https://git.openjdk.org/jdk/pull/28598 From chagedorn at openjdk.org Tue Dec 2 11:02:49 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 11:02:49 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v3] In-Reply-To: <0ZncRodHJbWfRFLrUCqQn5JPHilDPQA8e7dcrwARsOI=.7fe44906-eeed-49bb-8472-5a264391468b@github.com> References: <0ZncRodHJbWfRFLrUCqQn5JPHilDPQA8e7dcrwARsOI=.7fe44906-eeed-49bb-8472-5a264391468b@github.com> Message-ID: On Tue, 2 Dec 2025 08:13:47 GMT, Emanuel Peter wrote: >> **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. >> >> **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. >> >> **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - limit steps of optimize, for Manuel > - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes > - rm old documentation > - git move to new test > - streamline > - refactor and verify > - unique worklist > - wip solution > - ... and 1 more: https://git.openjdk.org/jdk/compare/01e88711...54881ff4 Looks good, thanks for the update! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28512#pullrequestreview-3529498827 From chagedorn at openjdk.org Tue Dec 2 11:05:35 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 11:05:35 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v4] In-Reply-To: <-SdGoKVu9WpxzbLyqrLt7duH-qK_Bbm6ErrWdDfxJUg=.95c14f7f-b940-4dc6-a63d-055419625a36@github.com> References: <-SdGoKVu9WpxzbLyqrLt7duH-qK_Bbm6ErrWdDfxJUg=.95c14f7f-b940-4dc6-a63d-055419625a36@github.com> Message-ID: On Tue, 2 Dec 2025 08:09:30 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > bug number in test, comment Testing passed! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28410#pullrequestreview-3529507626 From jbhateja at openjdk.org Tue Dec 2 11:17:43 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 2 Dec 2025 11:17:43 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v3] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> <5bV8t0Bo16-WVON8_AJLfcPDDqWVHDxIjmdGPPNazE8=.51d5a17d-1b87-44d4-ad41-e9d346e6b9f7@github.com> Message-ID: <3fvWzoSiBb5iYddxX90qvM7Vzhf9Nb218fc_dHWgS-E=.31e63450-14e2-4674-b966-93cb8bcbfb20@github.com> On Thu, 27 Nov 2025 16:12:59 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Fine tune matcher check > > Ok, that's fine with me too. > > It would be nice if you could also attach a regression test, or maybe add an additional run to the existing test, with the required flags for reproducing this issue. Hi @eme64 , kindly verify latest changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28533#issuecomment-3601503186 From jbhateja at openjdk.org Tue Dec 2 11:18:18 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 2 Dec 2025 11:18:18 GMT Subject: RFR: 8351844: C2 x64 AVX2 vpminmax assertion failure with equivalent inputs Message-ID: Bug fix PR fixes an incorrect register equivalence in macro assembler. MaxV/MinV IR with equivalent inputs should ideally be removed from ideal graph before reaching to macro assembler. [JDK-8372797](https://bugs.openjdk.org/browse/JDK-8372797) is filed to add relevant identity transformations. Best Regards, Jatin ------------- Commit messages: - 8351844: C2 x64 AVX2 vpminmax assertion failure with equivalent inputs Changes: https://git.openjdk.org/jdk/pull/28600/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28600&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351844 Stats: 70 lines in 2 files changed: 68 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28600.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28600/head:pull/28600 PR: https://git.openjdk.org/jdk/pull/28600 From roland at openjdk.org Tue Dec 2 11:21:05 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 11:21:05 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: References: Message-ID: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - review - review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28581/files - new: https://git.openjdk.org/jdk/pull/28581/files/eb7bd9ac..36fb3a6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=01-02 Stats: 49 lines in 5 files changed: 22 ins; 24 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28581/head:pull/28581 PR: https://git.openjdk.org/jdk/pull/28581 From roland at openjdk.org Tue Dec 2 11:21:05 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 11:21:05 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 16:17:03 GMT, Quan Anh Mai wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - review >> - review > > src/hotspot/share/opto/compile.hpp line 810: > >> 808: // Compilation environment. >> 809: Arena* comp_arena() { return &_comp_arena; } >> 810: ResourceArea* idealloop_arena() { return &_idealloop_arena; } > > Should we make it more idiomatic C++ by having the `ResourceArea` allocated and deallocated together with the `PhaseIdealLoop` instead of attaching it to the `Compile` object? Right, that makes sense. Done in new commits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2580735137 From roland at openjdk.org Tue Dec 2 11:21:06 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 11:21:06 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:26:22 GMT, Manuel H?ssig wrote: >> I'm not sure either what does. > > It is a marker to filter resource intensive tests. > > https://github.com/openjdk/jdk/blob/7278d2e8e5835f090672f7625d391a1b4c1a6626/test/hotspot/jtreg/TEST.ROOT#L29-L30 I added it in the new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2580735997 From roland at openjdk.org Tue Dec 2 11:21:08 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 11:21:08 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: References: Message-ID: <_HueHgU8Ha0yoG9cckWMGfms8D0WC6zGWKykIQkCeZM=.3f929996-99ee-4535-8973-b23ccf6b291e@github.com> On Mon, 1 Dec 2025 16:33:20 GMT, Beno?t Maillard wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - review >> - review > > test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java line 28: > >> 26: * @bug 8370519 >> 27: * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations >> 28: * @run main/othervm -XX:CompileCommand=compileonly,*TestVerifyLoopOptimizationsHighMemUsage*::* -XX:-TieredCompilation -Xbatch > > Out of curiosity, have you try reducing the test with `creduce`? I fixed a similar issue in [JDK-8366990](https://bugs.openjdk.org/browse/JDK-8366990), and initially reviewers were concerned about the long compilation time. I was able to get decent results with `creduce` by using `-XX:CompileCommand=memlimit`. Not sure if it's worth doing here though. I don't have `creduce` set up. I tried minimizing the test case by hand but it was fairly time consuming. It currently runs in 30s on a fairly fast machine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2580733874 From thartmann at openjdk.org Tue Dec 2 12:51:05 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 2 Dec 2025 12:51:05 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v7] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:13:38 GMT, Roland Westrelin wrote: >> In test cases, `mh` is initially not constant so the method handle >> invoke can't be inlined. It is later found to be constant, so it can >> be turned into a direct call by >> `Compile::process_late_inline_calls_no_inline()`. In the meantime, the >> `CallNode` for the mh invoke is cloned (by loop switching). In the >> process, only a shallow copy of the `JVMState` for the call is >> made. The initial `CallNode` is the first to be processed by >> `Compile::process_late_inline_calls_no_inline()` and that causes that >> `CallNode` to become dead. The cloned `CallNode` is then >> processed. The `JVMState` for that one references the initial >> `CallNode` in its caller's `JVMState`. Because that node is dead, that >> causes a crash. The fix I propose is to make a deep copy of the >> `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is >> assigned to the node. >> >> The other failure I see with these tests is: >> >> >> # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 >> # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! >> >> >> because even though the `CallNode` is cloned, there's still only one >> late inline recorded. The fix here is to increment >> `_number_of_mh_late_inlines` when the node is cloned. >> >> This was reported by the netty developers. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370939 > - Merge branch 'master' into JDK-8370939 > - review > - Merge branch 'master' into JDK-8370939 > - review > - more > - more > - more > - more > - test > - ... and 1 more: https://git.openjdk.org/jdk/compare/8558ffcd...64b11e6e Looks good to me. I submitted some testing and will report back once it passed. src/hotspot/share/opto/compile.hpp line 1102: > 1100: > 1101: void mark_has_mh_late_inlines() { _has_mh_late_inlines = true; } > 1102: bool has_mh_late_inlines() const { return _has_mh_late_inlines; } Suggestion: bool has_mh_late_inlines() const { return _has_mh_late_inlines; } ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28088#pullrequestreview-3529910175 PR Review Comment: https://git.openjdk.org/jdk/pull/28088#discussion_r2581026773 From epeter at openjdk.org Tue Dec 2 13:13:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 13:13:40 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v3] In-Reply-To: References: <0ZncRodHJbWfRFLrUCqQn5JPHilDPQA8e7dcrwARsOI=.7fe44906-eeed-49bb-8472-5a264391468b@github.com> Message-ID: On Tue, 2 Dec 2025 08:44:25 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - limit steps of optimize, for Manuel >> - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes >> - rm old documentation >> - git move to new test >> - streamline >> - refactor and verify >> - unique worklist >> - wip solution >> - ... and 1 more: https://git.openjdk.org/jdk/compare/195c2f9b...54881ff4 > > Marked as reviewed by mhaessig (Committer). @mhaessig @chhagedorn Thanks for the reviews and suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28512#issuecomment-3601959763 From epeter at openjdk.org Tue Dec 2 13:13:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 13:13:43 GMT Subject: Integrated: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 16:02:20 GMT, Emanuel Peter wrote: > **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. > > **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. > > **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. This pull request has now been integrated. Changeset: 6c01d3b0 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/6c01d3b08862447983b96daaf34a4c62daf54101 Stats: 208 lines in 3 files changed: 164 ins; 1 del; 43 mod 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism Reviewed-by: mhaessig, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28512 From krk at openjdk.org Tue Dec 2 13:32:22 2025 From: krk at openjdk.org (Kerem Kat) Date: Tue, 2 Dec 2025 13:32:22 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v5] In-Reply-To: References: Message-ID: > Do not try to replace `fallthrough_memproj` when it is null, fixes crash. > > Test case is simplified from the ticket. Verified that the case crashes without the fix. Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'master' into fix-c2-segfault-unlocknode - address comments - fix rename - rename test file - Merge branch 'master' into fix-c2-segfault-unlocknode - fix test spacing - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - copyright format fix? - 8370502: C2: segfault while adding node to IGVN worklist ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28432/files - new: https://git.openjdk.org/jdk/pull/28432/files/a0f0ecb9..21018290 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=03-04 Stats: 13228 lines in 281 files changed: 6943 ins; 5064 del; 1221 mod Patch: https://git.openjdk.org/jdk/pull/28432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28432/head:pull/28432 PR: https://git.openjdk.org/jdk/pull/28432 From krk at openjdk.org Tue Dec 2 13:32:26 2025 From: krk at openjdk.org (Kerem Kat) Date: Tue, 2 Dec 2025 13:32:26 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v4] In-Reply-To: <1Agol3OtcCV7ilUBseuyB3DMWXfinb4bTBnRafLtfS0=.d4081ee2-4495-471e-85e2-ffcc2f825d21@github.com> References: <1Agol3OtcCV7ilUBseuyB3DMWXfinb4bTBnRafLtfS0=.d4081ee2-4495-471e-85e2-ffcc2f825d21@github.com> Message-ID: On Fri, 28 Nov 2025 10:23:06 GMT, Emanuel Peter wrote: >> Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: >> >> fix rename > > test/hotspot/jtreg/compiler/c2/TestUnlockNodeNullMemprof.java line 29: > >> 27: * @summary Do not segfault while adding node to IGVN worklist >> 28: * >> 29: * @run main/othervm -Xbatch compiler.c2.TestUnlockNodeNullMemprof > > Suggestion: > > * @run main/othervm -Xbatch ${test.main.class} > > > Possible since a recent JTREG update. Makes wrongly copying class name go away ;) > > Also: I wonder if we should also have a run without any flags? Removing `-Xbatch` makes the test non-deterministic in this case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2581178352 From epeter at openjdk.org Tue Dec 2 13:35:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 13:35:29 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v8] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Mon, 1 Dec 2025 13:39:09 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Review comments resolutions Testing submitted! Code looks good to me :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28533#issuecomment-3602086228 From chagedorn at openjdk.org Tue Dec 2 13:49:05 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 13:49:05 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: <2fn6Bj9HrMiWa4K01CuW-vnL7XKWwkLcTapeVkHqWUo=.9d0e24a5-dc6c-4abe-ae45-10b93479071e@github.com> References: <2fn6Bj9HrMiWa4K01CuW-vnL7XKWwkLcTapeVkHqWUo=.9d0e24a5-dc6c-4abe-ae45-10b93479071e@github.com> Message-ID: On Tue, 2 Dec 2025 09:09:16 GMT, Roland Westrelin wrote: >> Crash occurs because a `MergeMem` node references itself: >> >> >> 608 MergeMem === _ 1 608 1 1 1 1 1 1 1 1 1 1 878 [[ 877 878 608 420 597 ]] { - - - - - - - - - - N878:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !orig=[524] !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) >> ``` >> >> Before IGVN, that part of the stream is: >> >> >> 522 Region === 522 604 521 [[ 522 538 523 524 525 526 527 528 529 530 531 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) >> 524 Phi === 522 608 464 [[ 588 581 564 546 564 559 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) >> >> 538 If === 522 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) >> 539 IfTrue === 538 [[ 553 547 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) >> 540 IfFalse === 538 [[ 548 546 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) >> 553 If === 539 535 [[ 554 555 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) >> 554 IfTrue === 553 [[ 562 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) >> 555 IfFalse === 553 [[ 548 559 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) >> >> 548 Region === 548 _ 540 555 [[ 548 562 561 563 564 565 566 567 568 569 570 571 572 573 574 575 576 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:88 (line 60) >> 564 Phi === 548 _ 524 524 [[ 581 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:85 (line 61) >> >> 562 Region === 562 548 554 [[ 562 600 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 596 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) >> 581 Phi === 562 564 524 [[ 420 597 610 608 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) >> >> 608 MergeMem === _ 1 581 1 1 1 1 1 1 1 1 1 1 588 [[ 524 ]] { - - - - - - - - - - N588:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) >> >> >> 522 is a loop head, 604 is the backedge. The loop becomes unreachable >> during IGVN. The loop body above is transformed to: >> >> >> 538 If === 604 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) >> 539 IfTrue === 538 ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8371464 > - test > - fix Testing passed! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28554#pullrequestreview-3530234212 From chagedorn at openjdk.org Tue Dec 2 13:54:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 13:54:26 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <2xxjKX6hMeKDfS9SGBEvll8yadDthCoUjCIRpaE8ObA=.b567ec00-7dad-4b57-82a4-db1149fc8942@github.com> On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 Thanks for the update, it looks good to me! If @eme64 also agrees with the latest patch, we can submit some testing and then hopefully get it in right before the fork. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3530251375 From chagedorn at openjdk.org Tue Dec 2 13:54:29 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 13:54:29 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: <6qShqR-Ohv7vamoJ_B4Ev-poU8SB96eTBo4HFJrylcI=.dac5a26f-c9f0-445b-8f1c-a7c719fa27ae@github.com> <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> Message-ID: On Thu, 27 Nov 2025 12:29:10 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/castnode.hpp line 101: >> >>> 99: } >>> 100: return NonFloatingNonNarrowing; >>> 101: } >> >> Just a side note: We seem to mix the terms "(non-)pinned" with "(non-)floating" freely. Should we stick to just one? But maybe it's justified to use both depending on the situation/code context. > > The patch as it is now adds some extra uses of "pinned" and "floating". What could make sense, I suppose, would be to try to use "floating"/"non floating" instead but there are so many uses of "pinned" in the code base already, and I don't see us getting rid of them, that I wonder if it would make a difference. So, I'm not too sure what to do. Yes, that's true. I was also unsure about whether we should stick with one or just allow both interchangeably. I guess since there are so many uses, we can just move forward with what you have now and still come back to clean it up if necessary - we can always do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581285955 From chagedorn at openjdk.org Tue Dec 2 13:54:34 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 13:54:34 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> References: <6qShqR-Ohv7vamoJ_B4Ev-poU8SB96eTBo4HFJrylcI=.dac5a26f-c9f0-445b-8f1c-a7c719fa27ae@github.com> <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> Message-ID: On Wed, 26 Nov 2025 13:24:05 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - review >> - review >> - Merge branch 'master' into JDK-8354282 >> - review >> - infinite loop in gvn fix >> - renaming >> - merge >> - Merge branch 'master' into JDK-8354282 >> - fix & test > > src/hotspot/share/opto/castnode.hpp line 120: > >> 118: // be removed in any case otherwise the sunk node floats back into the loop. >> 119: static const DependencyType NonFloatingNonNarrowing; >> 120: > > I needed a moment to completely understand all these combinations. I rewrote the definitions in this process a little bit. Feel free to take some of it over: > > > // All the possible combinations of floating/narrowing with example use cases: > > // Use case example: Range Check CastII > // Floating: The Cast is only dependent on the single range check. > // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely > // remove the cast because the array access will be safe. > static const DependencyType FloatingNarrowing; > > // Use case example: Widening Cast nodes' types after loop opts: We want to common Casts with slightly different types. > // Floating: These Casts only depend on the single control. > // NonNarrowing: Even when the input type is narrower, we are not removing the Cast. Otherwise, the dependency > // to the single control is lost, and an array access could float above its range check because we > // just removed the dependency to the range check by removing the Cast. This could lead to an > // out-of-bounds access. > static const DependencyType FloatingNonNarrowing; > > // Use case example: An array accesses that is no longer dependent on a single range check (e.g. range check smearing). > // NonFloating: The array access must be pinned below all the checks it depends on. If the check it directly depends > // on with a control input is hoisted, we do hoist the Cast as well. If we allowed the Cast to float, > // we risk that the array access ends up above another check it depends on (we cannot model two control > // dependencies for a node in the IR). This could lead to an out-of-bounds access. > // Narrowing: If the Cast does not narrow the input type, then it's safe to remove the cast because the array access > // will be safe. > static const DependencyType NonFloatingNarrowing; > > // Use case example: Sinking nodes out of a loop > // Non-Floating & Non-Narrowing: We don't want the Cast that forces the node to be out of loop to be removed in any > // case. Otherwise, the sunk node could float back into the l... Thanks for taking it over :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581287358 From roland at openjdk.org Tue Dec 2 14:03:07 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 14:03:07 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: References: <2fn6Bj9HrMiWa4K01CuW-vnL7XKWwkLcTapeVkHqWUo=.9d0e24a5-dc6c-4abe-ae45-10b93479071e@github.com> Message-ID: On Tue, 2 Dec 2025 13:46:48 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8371464 >> - test >> - fix > > Testing passed! @chhagedorn @dafedafe thanks for the reviews and testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28554#issuecomment-3602214864 From roland at openjdk.org Tue Dec 2 14:03:09 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 14:03:09 GMT Subject: Integrated: 8371464: C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 10:50:44 GMT, Roland Westrelin wrote: > Crash occurs because a `MergeMem` node references itself: > > > 608 MergeMem === _ 1 608 1 1 1 1 1 1 1 1 1 1 878 [[ 877 878 608 420 597 ]] { - - - - - - - - - - N878:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !orig=[524] !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > ``` > > Before IGVN, that part of the stream is: > > > 522 Region === 522 604 521 [[ 522 538 523 524 525 526 527 528 529 530 531 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > 524 Phi === 522 608 464 [[ 588 581 564 546 564 559 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > > 538 If === 522 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 553 547 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 540 IfFalse === 538 [[ 548 546 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 553 If === 539 535 [[ 554 555 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 554 IfTrue === 553 [[ 562 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 555 IfFalse === 553 [[ 548 559 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > > 548 Region === 548 _ 540 555 [[ 548 562 561 563 564 565 566 567 568 569 570 571 572 573 574 575 576 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:88 (line 60) > 564 Phi === 548 _ 524 524 [[ 581 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:85 (line 61) > > 562 Region === 562 548 554 [[ 562 600 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 596 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > 581 Phi === 562 564 524 [[ 420 597 610 608 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > > 608 MergeMem === _ 1 581 1 1 1 1 1 1 1 1 1 1 588 [[ 524 ]] { - - - - - - - - - - N588:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > > > 522 is a loop head, 604 is the backedge. The loop becomes unreachable > during IGVN. The loop body above is transformed to: > > > 538 If === 604 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 562 547 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (l... This pull request has now been integrated. Changeset: a62296d8 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/a62296d8a0858d63a930e91168254a9927f06783 Stats: 91 lines in 3 files changed: 84 ins; 0 del; 7 mod 8371464: C2: assert(no_dead_loop) failed: dead loop detected Reviewed-by: chagedorn, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/28554 From dfenacci at openjdk.org Tue Dec 2 14:44:10 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 2 Dec 2025 14:44:10 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory In-Reply-To: References: Message-ID: <3SFe0aKR8DW5SKjr375S78OWgJS7g2pLZfepb43yISI=.958eda85-ca1a-4f85-a9a2-c7ad60dcc025@github.com> On Mon, 1 Dec 2025 21:16:18 GMT, Volodymyr Paprotski wrote: > Requires a Broadwell machine, but was able to reproduce with an emulator: > > > ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi > ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run Thanks @vpaprotsk for fixing this. Looks good to me (if tests are OK). src/hotspot/cpu/x86/stubDeclarations_x86.hpp line 76: > 74: do_arch_entry, \ > 75: do_arch_entry_init) \ > 76: do_arch_blob(compiler, 120000 WINDOWS_ONLY(+2000)) \ I was wondering if there are any reason for this value (apart that it is enough for the test to pass. I just noticed that it has been increased already in the past). ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/28588#pullrequestreview-3530481034 PR Review Comment: https://git.openjdk.org/jdk/pull/28588#discussion_r2581474386 From rrich at openjdk.org Tue Dec 2 14:45:09 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 2 Dec 2025 14:45:09 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v7] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 10:27:36 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Minor simplification. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Fix missing whitespace. > - Address review comments. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Remove K from AES_Crypt > - More minor cleanup. > - Improve comment and minor cleanup. > - 8371820: Further AES performance improvements for key schedule generation The changes look good to me. Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28299#pullrequestreview-3530496843 From mdoerr at openjdk.org Tue Dec 2 14:50:09 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 Dec 2025 14:50:09 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v7] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 10:27:36 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Minor simplification. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Fix missing whitespace. > - Address review comments. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Remove K from AES_Crypt > - More minor cleanup. > - Improve comment and minor cleanup. > - 8371820: Further AES performance improvements for key schedule generation Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3602426006 From bulasevich at openjdk.org Tue Dec 2 15:02:35 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 2 Dec 2025 15:02:35 GMT Subject: RFR: 8280283: Dead compiler code found during the JDK-8272058 code review In-Reply-To: References: Message-ID: <2DyhWZxKPAXQbCsjHhoSUQZ80Em0931LE2LRjLNRdHA=.cc61d9bd-fc90-40ea-88e9-ac76c21b5756@github.com> On Mon, 24 Nov 2025 09:26:13 GMT, Anton Seoane Ampudia wrote: > This PR removes some dead code that was found during review for [JDK-8272058](https://bugs.openjdk.org/browse/JDK-8272058). > > `target_addr_for_insn_or_null` is never run with a `ldrw` to `zr` (i.e. a safepoint poll). This is just a remnant from global safepointing, before we moved to using thread-local handshakes. No safepoint polling code reaches this function. More information can be read in the [original code review](https://github.com/openjdk/jdk18/pull/51#discussion_r774922087). Additionally, I have run tiers 1-6 to make sure this path did not exercise. > > This changeset also cleans up the unused `is_nop` function, following the comments in the issue. Other dead code mentioned there has since been long disappered. > > **Testing:** passes tiers 1-4 Nice cleanup. Cleaning up dead code always helps reduce technical debt. Are you sure there isn?t more to clean up? Have you tried building with GCC?s -Wunused options to catch additional unused symbols? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28473#issuecomment-3602485639 From mli at openjdk.org Tue Dec 2 15:16:59 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 2 Dec 2025 15:16:59 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 03:11:07 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify label L_EXIT to L_exit src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2636: > 2634: void counterMode_AESCrypt(int round, Register in, Register out, Register key, Register counter, > 2635: Register input_len, Register saved_encrypted_ctr, Register used_ptr) { > 2636: // Algorithm: This should be my last comment :) Where is this "Algorithm" from? Can you put a link here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2581629384 From mli at openjdk.org Tue Dec 2 15:17:01 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 2 Dec 2025 15:17:01 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: <9oPWTWflnwws0wxHBP58IiQRIZz4Tt5bthr7RiC3BE0=.94d60901-8fad-4597-9e55-c669de73a8e6@github.com> Message-ID: On Thu, 20 Nov 2025 02:48:23 GMT, Anjian Wen wrote: >> There is a `mv` before exit of `generate_counterMode_AESCrypt`, is this one still necessary? > > Yes, about the `mv` before `generate_counterMode_AESCrypt`, it is for a different branch when input_len is zero at the first time. For the purpose to avoid additional jump, each code exit from `counterMode_AESCrypt` is a Independent exit, so I think we need to keep this `mv` here. I see, although looks a bit strange to me to return in this way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2581628935 From shade at openjdk.org Tue Dec 2 15:26:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 15:26:22 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v4] In-Reply-To: <-SdGoKVu9WpxzbLyqrLt7duH-qK_Bbm6ErrWdDfxJUg=.95c14f7f-b940-4dc6-a63d-055419625a36@github.com> References: <-SdGoKVu9WpxzbLyqrLt7duH-qK_Bbm6ErrWdDfxJUg=.95c14f7f-b940-4dc6-a63d-055419625a36@github.com> Message-ID: On Tue, 2 Dec 2025 08:09:30 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > bug number in test, comment Let's go then? I am eager to try and enable deeper CTW testing again :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3602590259 From vpaprotski at openjdk.org Tue Dec 2 15:30:43 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 2 Dec 2025 15:30:43 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory In-Reply-To: <3SFe0aKR8DW5SKjr375S78OWgJS7g2pLZfepb43yISI=.958eda85-ca1a-4f85-a9a2-c7ad60dcc025@github.com> References: <3SFe0aKR8DW5SKjr375S78OWgJS7g2pLZfepb43yISI=.958eda85-ca1a-4f85-a9a2-c7ad60dcc025@github.com> Message-ID: On Tue, 2 Dec 2025 14:39:15 GMT, Damon Fenacci wrote: >> Requires a Broadwell machine, but was able to reproduce with an emulator: >> >> >> ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi >> ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run > > src/hotspot/cpu/x86/stubDeclarations_x86.hpp line 76: > >> 74: do_arch_entry, \ >> 75: do_arch_entry_init) \ >> 76: do_arch_blob(compiler, 120000 WINDOWS_ONLY(+2000)) \ > > I was wondering if there are any reason for this value (apart that it is enough for the test to pass. I just noticed that it has been increased already in the past). The assert was suggesting 119k (and change..) so I rounded slightly up. I was going to ask (i.e. @TobiHartmann ?) if thats enough.. (Similarly, I am concerned that I am contributing to a larger JVM footprint, with my changes.. but I suppose 11k is comparatively insignificant in the grand scheme of things...) Thanks for the review! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28588#discussion_r2581685779 From epeter at openjdk.org Tue Dec 2 15:32:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:55 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 src/hotspot/share/opto/castnode.hpp line 108: > 106: // Floating: The Cast is only dependent on the single range check. > 107: // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely > 108: // remove the cast because the array access will be safe. The "Floating" part is a bit counter intuitive here, because the ctrl of the CastII is the RangeCheck, right? So is it not therefore already pinned? Maybe we can add some detail about what the "floating" explicitly means here. Is it that we can later move the CastII up in an optimization? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581630546 From epeter at openjdk.org Tue Dec 2 15:32:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:56 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> On Tue, 2 Dec 2025 15:14:28 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'master' into JDK-8354282 >> - whitespace >> - review >> - review >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java >> >> Co-authored-by: Christian Hagedorn >> - review >> - review >> - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 > > src/hotspot/share/opto/castnode.hpp line 108: > >> 106: // Floating: The Cast is only dependent on the single range check. >> 107: // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely >> 108: // remove the cast because the array access will be safe. > > The "Floating" part is a bit counter intuitive here, because the ctrl of the CastII is the RangeCheck, right? > So is it not therefore already pinned? > > Maybe we can add some detail about what the "floating" explicitly means here. Is it that we can later move the CastII up in an optimization? Actually, I'm wondering if the term `hoistable` and `non-hoistable` would not be better terms... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581642290 From epeter at openjdk.org Tue Dec 2 15:32:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:58 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> Message-ID: <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> On Tue, 2 Dec 2025 15:19:26 GMT, Emanuel Peter wrote: >> Actually, I'm wondering if the term `hoistable` and `non-hoistable` would not be better terms... > > At least we could say that it is allowed to hoist the RangeCheck, and the CastII could float up to where the RC is hoisted. Suggestion: // Use case example: Range Check CastII // Floating: The Cast is only dependent on the single range check. If the range check was ever to be hoisted // is would be safe to let the the Cast float to where the range check is hoisted up to. // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely // remove the cast because the array access will be safe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581692285 From epeter at openjdk.org Tue Dec 2 15:32:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:57 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> Message-ID: <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> On Tue, 2 Dec 2025 15:17:38 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/castnode.hpp line 108: >> >>> 106: // Floating: The Cast is only dependent on the single range check. >>> 107: // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely >>> 108: // remove the cast because the array access will be safe. >> >> The "Floating" part is a bit counter intuitive here, because the ctrl of the CastII is the RangeCheck, right? >> So is it not therefore already pinned? >> >> Maybe we can add some detail about what the "floating" explicitly means here. Is it that we can later move the CastII up in an optimization? > > Actually, I'm wondering if the term `hoistable` and `non-hoistable` would not be better terms... At least we could say that it is allowed to hoist the RangeCheck, and the CastII could float up to where the RC is hoisted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581649395 From shade at openjdk.org Tue Dec 2 15:39:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 15:39:54 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v5] In-Reply-To: References: Message-ID: > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Enable more testing - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Update src/hotspot/share/compiler/compiler_globals.hpp Co-authored-by: Tobias Hartmann - Revert separate patch - Final - Proper option name and bump the limits - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26068/files - new: https://git.openjdk.org/jdk/pull/26068/files/f381a337..97975dd0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=03-04 Stats: 82234 lines in 1298 files changed: 53590 ins; 20514 del; 8130 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From duke at openjdk.org Tue Dec 2 15:40:49 2025 From: duke at openjdk.org (duke) Date: Tue, 2 Dec 2025 15:40:49 GMT Subject: Withdrawn: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 20:51:46 GMT, Jatin Bhateja wrote: > Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. > It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. > > Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). > > Vector API jtreg tests pass at AVX level 2, remaining validation in progress. > > Performance numbers: > > > System : 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms > VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms > VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms > VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms > VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms > VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 ... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24104 From vpaprotski at openjdk.org Tue Dec 2 15:40:51 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 2 Dec 2025 15:40:51 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory [v2] In-Reply-To: References: Message-ID: > Requires a Broadwell machine, but was able to reproduce with an emulator: > > > ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi > ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: comment from Manuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28588/files - new: https://git.openjdk.org/jdk/pull/28588/files/7870115c..c53924e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28588&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28588&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28588.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28588/head:pull/28588 PR: https://git.openjdk.org/jdk/pull/28588 From vpaprotski at openjdk.org Tue Dec 2 15:40:53 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 2 Dec 2025 15:40:53 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 08:55:40 GMT, Manuel H?ssig wrote: > https://github.com/openjdk/jdk/blob/84ffe87260753973835ea6b88443e28bcaf0122f/test/hotspot/jtreg/ProblemList.txt#L82 > > Meanwhile, I will run testing on our side and report back with the results. Done. Thanks for the tests @mhaessig let me know how it goes ------------- PR Comment: https://git.openjdk.org/jdk/pull/28588#issuecomment-3602651621 From duke at openjdk.org Tue Dec 2 15:41:07 2025 From: duke at openjdk.org (Zihao Lin) Date: Tue, 2 Dec 2025 15:41:07 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v10] In-Reply-To: References: Message-ID: > If nodes both are constant, support constant folding. Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'openjdk:master' into JDK-8370196 - fix test failed - fix make unsigned - Merge branch 'master' into JDK-8370196 - Fix - Fix - Apply suggestion from @eme64 Co-authored-by: Emanuel Peter - Add Math to Operations.java - Add tests - Merge branch 'master' into JDK-8370196 - ... and 3 more: https://git.openjdk.org/jdk/compare/a62296d8...30fa1f03 ------------- Changes: https://git.openjdk.org/jdk/pull/28097/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=09 Stats: 373 lines in 8 files changed: 336 ins; 14 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/28097.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28097/head:pull/28097 PR: https://git.openjdk.org/jdk/pull/28097 From qamai at openjdk.org Tue Dec 2 15:46:54 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 15:46:54 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v4] In-Reply-To: <-SdGoKVu9WpxzbLyqrLt7duH-qK_Bbm6ErrWdDfxJUg=.95c14f7f-b940-4dc6-a63d-055419625a36@github.com> References: <-SdGoKVu9WpxzbLyqrLt7duH-qK_Bbm6ErrWdDfxJUg=.95c14f7f-b940-4dc6-a63d-055419625a36@github.com> Message-ID: On Tue, 2 Dec 2025 08:09:30 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > bug number in test, comment Thanks for the approval! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3602683327 From qamai at openjdk.org Tue Dec 2 15:46:56 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 15:46:56 GMT Subject: Integrated: 8371964: C2 compilation asserts with "Unexpected load/store size" In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 08:42:46 GMT, Quan Anh Mai wrote: > Hi, > > This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. > > To be more specific, for this issue, we have the graph that looks like: > > ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen > > with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: > > ConI -> ConvI2L -> VectorMaskGen > > After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. > > Please take a look and leave your thoughts, thanks a lot. This pull request has now been integrated. Changeset: ca4ae806 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/ca4ae8063edddda36fafafd06b9b1a88ffbf9d2e Stats: 23 lines in 2 files changed: 19 ins; 0 del; 4 mod 8371964: C2 compilation asserts with "Unexpected load/store size" Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28410 From mhaessig at openjdk.org Tue Dec 2 15:48:07 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 2 Dec 2025 15:48:07 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v5] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 13:32:22 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into fix-c2-segfault-unlocknode > - address comments > - fix rename > - rename test file > - Merge branch 'master' into fix-c2-segfault-unlocknode > - fix test spacing > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - copyright format fix? > - 8370502: C2: segfault while adding node to IGVN worklist Thank you for addressing my comments. This looks good to me now. I will also run some testing on my side and report back with the results as soon as they are available. ------------- PR Review: https://git.openjdk.org/jdk/pull/28432#pullrequestreview-3530839782 From mhaessig at openjdk.org Tue Dec 2 15:54:00 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 2 Dec 2025 15:54:00 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 15:40:51 GMT, Volodymyr Paprotski wrote: >> Requires a Broadwell machine, but was able to reproduce with an emulator: >> >> >> ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi >> ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Manuel Thank you for addressing my comments. Testing passed up to tier6. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28588#pullrequestreview-3530874911 From shade at openjdk.org Tue Dec 2 16:04:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 16:04:12 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v6] In-Reply-To: References: Message-ID: <1o0uydFw-zaax7mVXkqJL15Cto0okbjQtkoqs6ADUyU=.001fda73-be80-41bd-9d2f-9258889117e3@github.com> > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'master' into JDK-8360557-ctw-inlining - Enable more testing - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Update src/hotspot/share/compiler/compiler_globals.hpp Co-authored-by: Tobias Hartmann - Revert separate patch - Final - Proper option name and bump the limits - ... and 1 more: https://git.openjdk.org/jdk/compare/511a8fe5...2d02b713 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26068/files - new: https://git.openjdk.org/jdk/pull/26068/files/97975dd0..2d02b713 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=04-05 Stats: 1501 lines in 42 files changed: 888 ins; 231 del; 382 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From dfenacci at openjdk.org Tue Dec 2 16:26:56 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 2 Dec 2025 16:26:56 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 15:40:51 GMT, Volodymyr Paprotski wrote: >> Requires a Broadwell machine, but was able to reproduce with an emulator: >> >> >> ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi >> ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Manuel Marked as reviewed by dfenacci (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28588#pullrequestreview-3531053225 From epeter at openjdk.org Tue Dec 2 16:52:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 16:52:30 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <9zey9SqquL1zLlFLuyKV_18OiZs2UQSokhREx9ln0l0=.edc15ede-e798-4d88-b61a-d2ed086d99da@github.com> On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 @rwestrel Nice work! We not just only fixed the bug but made the concepts much clearer. This makes me very happy ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3531172652 From epeter at openjdk.org Tue Dec 2 16:52:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 16:52:32 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> Message-ID: On Tue, 2 Dec 2025 15:29:42 GMT, Emanuel Peter wrote: >> At least we could say that it is allowed to hoist the RangeCheck, and the CastII could float up to where the RC is hoisted. > > Suggestion: > > // Use case example: Range Check CastII > // Floating: The Cast is only dependent on the single range check. If the range check was ever to be hoisted > // is would be safe to let the the Cast float to where the range check is hoisted up to. > // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely > // remove the cast because the array access will be safe. Ok, I now read the PR from the top, and not just recent changes. If one were to start reading from the top, it would be clear without my suggestions here. But I think it could still be good to apply something about letting the Cast float to where we would hoist the RC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2582034834 From liach at openjdk.org Tue Dec 2 17:27:43 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 17:27:43 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:28:03 GMT, Per Minborg wrote: >> Chen Liang has updated the pull request incrementally with one additional commit since the last revision: >> >> Tweak VH usage in some classes > > src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2036: > >> 2034: var constant = MethodHandleImpl.isCompileConstant(vh); >> 2035: var cache = adaptedMh; >> 2036: if (constant == MethodHandleImpl.CONSTANT_YES && cache != null) { > > Rookie question: Is there multi-thread considerations here? How about visibility across threads? MethodHandle is immutable and can be safely published. So this is ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2582166754 From epeter at openjdk.org Tue Dec 2 17:38:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 17:38:59 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v3] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:46:05 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/phaseX.cpp line 2085: >> >>> 2083: } >>> 2084: return false; >>> 2085: } >> >> Why not call it `verify_node_invariants_for`? >> >> You should also assert immediately. @benoitmaillard Is about to make that change for everything: https://github.com/openjdk/jdk/pull/28295 > > That one is not integrated. Shouldn't I do that change only if it/when integrates? Right, keep it, just be informed, it may get integrated soon :) Renaming would still be good ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2582199486 From qamai at openjdk.org Tue Dec 2 17:48:43 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 17:48:43 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 src/hotspot/share/opto/castnode.hpp line 105: > 103: // All the possible combinations of floating/narrowing with example use cases: > 104: > 105: // Use case example: Range Check CastII I believe this is incorrect, a range check should be floating non-narrowing. It is only narrowing if the length of the array is a constant. It is because this cast encodes the dependency on the condition `index u< length`. This condition cannot be expressed in terms of `Type` unless `length` is a constant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2582188782 From qamai at openjdk.org Tue Dec 2 17:48:44 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 17:48:44 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> Message-ID: On Tue, 2 Dec 2025 16:48:55 GMT, Emanuel Peter wrote: >> Suggestion: >> >> // Use case example: Range Check CastII >> // Floating: The Cast is only dependent on the single range check. If the range check was ever to be hoisted >> // it would be safe to let the the Cast float to where the range check is hoisted up to. >> // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely >> // remove the cast because the array access will be safe. > > Ok, I now read the PR from the top, and not just recent changes. If one were to start reading from the top, it would be clear without my suggestions here. But I think it could still be good to apply something about letting the Cast float to where we would hoist the RC. Naming is hard, but it is worth pointing out in the comment that floating here refers to `depends_only_on_test`. In other words, a cast is considered floating if it is legal to change the control input of a cast from an `IfTrue` or `IfFalse` to an `IfTrue` and `IfFalse` that dominates the current control input, and the corresponding conditions of the `If`s are the same. In contrast, we cannot do that for a pinned cast, and if the control is folded away, the control input of the pinned cast is changed to the control predecessor of the folded node. It is also worth noting that we have `Node::pinned` which means the node is pinned AT the control input while pinned here means that it is pinned UNDER the control input. Very confusing! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2582215477 From dlong at openjdk.org Tue Dec 2 18:22:33 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 2 Dec 2025 18:22:33 GMT Subject: Integrated: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack In-Reply-To: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: On Tue, 25 Nov 2025 03:25:05 GMT, Dean Long wrote: > The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. > > I played around with other approaches, like: > 1. setting the stack to empty > 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() > 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set > but in the end I decided to go with the minimal localized low-risk change. This pull request has now been integrated. Changeset: 5627ff2d Author: Dean Long URL: https://git.openjdk.org/jdk/commit/5627ff2d9165ee1f7354c1ff1626f4949ef7fa3f Stats: 15 lines in 2 files changed: 8 ins; 1 del; 6 mod 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack Co-authored-by: Manuel H?ssig Reviewed-by: mhaessig, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28486 From dlong at openjdk.org Tue Dec 2 18:41:44 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 2 Dec 2025 18:41:44 GMT Subject: RFR: 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 In-Reply-To: References: Message-ID: <1CdpBw0mdYmQtGrr73r8FYkWk3BDdcRs8lTbixw3Sd0=.c0aee48c-91a0-422f-8c00-46d5f9b705d6@github.com> On Tue, 2 Dec 2025 10:44:24 GMT, Aleksey Shipilev wrote: > I just noticed (while looking at [JDK-8372800](https://bugs.openjdk.org/browse/JDK-8372800)) that I made a little error in [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) refactor, which made GetAndSet-acquire instruction cost twice as high. The usual cost for acquire versions are twice as low, likely to be selected instead of non-acquire versions. > > This bug happened as I "simplified" stencils at some point by dropping some arguments and renumbering the remaining ones. This is one place where I apparently forgot to renumber one usage. See other checks for `ifelse($3,Acq,...` in that stencil, all of them are `$3` (correct), not `$4` (incorrect). Seen no real bugs because of this mishap, but it would be good to fix it in case we see issues later. I also looked at stencils again, and I think there are no other argument-index problems like this anywhere else. > > The real change is in `aarch64_atomic_ad.m4`, `.ad` is re-generated from that stencil. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, quick jcstress run Marked as reviewed by dlong (Reviewer). I kicked off Oracle testing. I'm tempted to say this is trivial, reverting the costs to what they were before, but a 2nd review wouldn't hurt. I think the reason it didn't cause a regression is because in case of ties, the later acquire rule is still the first candidate. ------------- PR Review: https://git.openjdk.org/jdk/pull/28598#pullrequestreview-3531635921 PR Comment: https://git.openjdk.org/jdk/pull/28598#issuecomment-3603461868 From valeriep at openjdk.org Tue Dec 2 18:56:52 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Tue, 2 Dec 2025 18:56:52 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v7] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 10:27:36 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Minor simplification. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Fix missing whitespace. > - Address review comments. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Remove K from AES_Crypt > - More minor cleanup. > - Improve comment and minor cleanup. > - 8371820: Further AES performance improvements for key schedule generation Marked as reviewed by valeriep (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28299#pullrequestreview-3531689005 From mdoerr at openjdk.org Tue Dec 2 19:36:58 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 Dec 2025 19:36:58 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v7] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 10:27:36 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Minor simplification. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Fix missing whitespace. > - Address review comments. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Remove K from AES_Crypt > - More minor cleanup. > - Improve comment and minor cleanup. > - 8371820: Further AES performance improvements for key schedule generation Thanks for all reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3603660124 From mdoerr at openjdk.org Tue Dec 2 19:40:17 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 Dec 2025 19:40:17 GMT Subject: Integrated: 8371820: Further AES performance improvements for key schedule generation In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 16:48:28 GMT, Martin Doerr wrote: > This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. > > The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. This pull request has now been integrated. Changeset: 618732ff Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/618732ffc04ef393c9b8a3265c12ba66f31784d9 Stats: 61 lines in 7 files changed: 13 ins; 8 del; 40 mod 8371820: Further AES performance improvements for key schedule generation Reviewed-by: rrich, valeriep ------------- PR: https://git.openjdk.org/jdk/pull/28299 From vlivanov at openjdk.org Tue Dec 2 20:21:29 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 20:21:29 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> Message-ID: <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0rbfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> On Tue, 2 Dec 2025 02:51:57 GMT, Chen Liang wrote: >> So, it seems like what you are trying to achieve is a 1-1 mapping from `AccessDescriptor` to `vh` through `adaptedMh`. So, once `cache != null` you can trust that it corresponds to the `vh` instance passed as a constant. But cache pollution can easily break the invariant, so you try to eliminate the pollution by avoiding cache updates when vh is not constant. Do I get it right? > > No. The avoidance of cache update simply trims down the generated code by throwing away the meaningless cache update. > > The access to cache is already safeguarded by `constant == MethodHandleImpl.CONSTANT_YES`. I should have moved `var cache = adaptedMh;` into the if block of `constant == CONSTANT_YES`. I still find it confusing, especially tri-state logic part. For background, `isCompileConstant` was introduced as part of LF sharing effort to get rid of Java-level profiling in optimized code. The pattern is was designed for was: if (isCompileConstant(...)) { return ...; } else { ... // do some extra work (either in interpreter, C1, or not-fully-optimized version in C2) } In this patch, you don't follow that pattern and aadd new state (`CONSTANT_PENDING`) to distinguish interpreter/C1 from C2. What's the motivation? Why do you want to avoid cache updates coming from C2-generated code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2582647097 From vlivanov at openjdk.org Tue Dec 2 20:21:33 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 20:21:33 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:41:04 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Tweak VH usage in some classes src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2042: > 2040: // This is still a hot path if vh is not constant - in this case, > 2041: // asType is the bottleneck for constant folding, unfortunately > 2042: var result = vh.getMethodHandle(mode).asType(symbolicMethodTypeInvoker); `mode` and `symbolicMethodTypeInvoker` are part of `AccessDescriptor` while `vh` comes as an argument. What guarantees that a cached adapter is compatible with `vh` observed during subsequent calls? It means that `vh` shape stays exactly the same shape. Is it correct? Would be good to have it validated with asserts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2582661917 From shade at openjdk.org Tue Dec 2 20:33:19 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 20:33:19 GMT Subject: RFR: 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:44:24 GMT, Aleksey Shipilev wrote: > I just noticed (while looking at [JDK-8372800](https://bugs.openjdk.org/browse/JDK-8372800)) that I made a little error in [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) refactor, which made GetAndSet-acquire instruction cost twice as high. The usual cost for acquire versions are twice as low, likely to be selected instead of non-acquire versions. > > This bug happened as I "simplified" stencils at some point by dropping some arguments and renumbering the remaining ones. This is one place where I apparently forgot to renumber one usage. See other checks for `ifelse($3,Acq,...` in that stencil, all of them are `$3` (correct), not `$4` (incorrect). Seen no real bugs because of this mishap, but it would be good to fix it in case we see issues later. I also looked at stencils again, and I think there are no other argument-index problems like this anywhere else. > > The real change is in `aarch64_atomic_ad.m4`, `.ad` is re-generated from that stencil. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, quick jcstress run Thanks Dean! jcstress run comes back clean as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28598#issuecomment-3603848986 From vlivanov at openjdk.org Tue Dec 2 20:39:55 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 20:39:55 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 02:49:52 GMT, Chen Liang wrote: > would it be safer for us to move the constant detection after generate_virtual_guard in the is_virtual if block? Good catch. I missed that the intrinsic is shared between `System::identityHashCode()` and `Object::hashCode`. I'm not sure it makes sense to support `Object::hashCode` unless C2 can eliminate `generate_virtual_guard` for a constant receiver. I'd just limit constant folding to `!is_virtual` case for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3603869560 From vlivanov at openjdk.org Tue Dec 2 20:45:38 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 20:45:38 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 06:42:32 GMT, Tobias Hartmann wrote: > Just an observation: This patch will only allow folding during parsing. I would expect that often, opportunities only arise after other optimizations already took place. I deliberately omitted post-parse optimization opportunities for now. It would require a gradual lowering of the representation from a high-level macro node to low-level poking at object header. Moreover, final representation has complex control, so either the macro node should be a CFG node or a way to determine a location in CFG for a data-only macro node and expanding it there needs to be supported. (There are other use cases for such functionality, like lowering data nodes into pure calls, but no readily available implementation is there yet.) IMO something to work on in a follow-up enhancement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3603886860 From vlivanov at openjdk.org Tue Dec 2 20:46:36 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 20:46:36 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> References: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> Message-ID: On Mon, 1 Dec 2025 19:30:22 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Test fix Any reviews, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28517#issuecomment-3603891338 From vlivanov at openjdk.org Tue Dec 2 20:56:18 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 20:56:18 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 20:36:50 GMT, Vladimir Ivanov wrote: > I'm not sure it makes sense to support Object::hashCode unless C2 can eliminate generate_virtual_guard for a constant receiver. I'd just limit constant folding to !is_virtual case for now. Or, alternatively, inspect constant object's v-table during compilation and ensure that corresponding slot points at `Object::hashCode`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3603919570 From kvn at openjdk.org Tue Dec 2 21:00:58 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Dec 2025 21:00:58 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:01:08 GMT, Chen Liang wrote: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. Good. Yes, we can work on constant folding in IGVN later. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28589#pullrequestreview-3532076661 From kvn at openjdk.org Tue Dec 2 21:01:00 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Dec 2025 21:01:00 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 20:53:56 GMT, Vladimir Ivanov wrote: >>> would it be safer for us to move the constant detection after generate_virtual_guard in the is_virtual if block? >> >> Good catch. I missed that the intrinsic is shared between `System::identityHashCode()` and `Object::hashCode`. >> >> I'm not sure it makes sense to support `Object::hashCode` unless C2 can eliminate `generate_virtual_guard` for a constant receiver. I'd just limit constant folding to `!is_virtual` case for now. > >> I'm not sure it makes sense to support Object::hashCode unless C2 can eliminate generate_virtual_guard for a constant receiver. I'd just limit constant folding to !is_virtual case for now. > > Or, alternatively, inspect constant object's v-table during compilation and ensure that corresponding slot points at `Object::hashCode`. @iwanowww please fix title to match JBS. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3603933760 From liach at openjdk.org Tue Dec 2 22:06:49 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 22:06:49 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 20:18:19 GMT, Vladimir Ivanov wrote: >> Chen Liang has updated the pull request incrementally with one additional commit since the last revision: >> >> Tweak VH usage in some classes > > src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2042: > >> 2040: // This is still a hot path if vh is not constant - in this case, >> 2041: // asType is the bottleneck for constant folding, unfortunately >> 2042: var result = vh.getMethodHandle(mode).asType(symbolicMethodTypeInvoker); > > `mode` and `symbolicMethodTypeInvoker` are part of `AccessDescriptor` while `vh` comes as an argument. What guarantees that a cached adapter is compatible with `vh` observed during subsequent calls? It means that `vh` shape stays exactly the same shape. Is it correct? Would be good to have it validated with asserts. I am assuming that the previous `vh` observed is compatible with future ones if compiler can fold the `vh` into a constant. If it is not, we can drop the updates to the cache field in the C2 compiled slow code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2582922472 From liach at openjdk.org Tue Dec 2 22:10:31 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 22:10:31 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0rbfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0r bfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> Message-ID: On Tue, 2 Dec 2025 20:12:12 GMT, Vladimir Ivanov wrote: >> No. The avoidance of cache update simply trims down the generated code by throwing away the meaningless cache update. >> >> The access to cache is already safeguarded by `constant == MethodHandleImpl.CONSTANT_YES`. I should have moved `var cache = adaptedMh;` into the if block of `constant == CONSTANT_YES`. > > I still find it confusing, especially tri-state logic part. > > For background, `isCompileConstant` was introduced as part of LF sharing effort to get rid of Java-level profiling in optimized code. The pattern is was designed for was: > > if (isCompileConstant(...)) { > return ...; > } else { > ... // do some extra work (either in interpreter, C1, or not-fully-optimized version in C2) > } > > > In this patch, you don't follow that pattern and aadd new state (`CONSTANT_PENDING`) to distinguish interpreter/C1 from C2. What's the motivation? Why do you want to avoid cache updates coming from C2-generated code? I am assuming that if C2 determines this `vh` is not a constant, we can drop it. Is that a right way to move along, or could C2 transition from "not a constant" to "is a constant" during the phases? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2582931449 From vlivanov at openjdk.org Tue Dec 2 22:28:01 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 22:28:01 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 20:58:23 GMT, Vladimir Kozlov wrote: >>> I'm not sure it makes sense to support Object::hashCode unless C2 can eliminate generate_virtual_guard for a constant receiver. I'd just limit constant folding to !is_virtual case for now. >> >> Or, alternatively, inspect constant object's v-table during compilation and ensure that corresponding slot points at `Object::hashCode`. > > @iwanowww please fix title to match JBS. @vnkozlov I can't since I'm not the author of the PR :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3604211003 From liach at openjdk.org Tue Dec 2 23:18:26 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 23:18:26 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v2] In-Reply-To: References: Message-ID: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const - Cleanup - identity hash support in C2 - Move around - Constant fold identity hash ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28589/files - new: https://git.openjdk.org/jdk/pull/28589/files/4a82f79d..69225241 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=00-01 Stats: 4382 lines in 99 files changed: 2947 ins; 684 del; 751 mod Patch: https://git.openjdk.org/jdk/pull/28589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28589/head:pull/28589 PR: https://git.openjdk.org/jdk/pull/28589 From liach at openjdk.org Tue Dec 2 23:18:28 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 23:18:28 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:01:08 GMT, Chen Liang wrote: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. I tried to come up with an example where the buggy code from Vladimir would inline to identityHashCode when the right call would be virtual - couldn't construct such a case unfortunately :( I think we can deal with IGVN later, as this involves creating new macro node and other infrastructure support. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3604321567 From liach at openjdk.org Tue Dec 2 23:20:12 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 23:20:12 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v4] In-Reply-To: References: Message-ID: > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Rollback getAndAdd for now - Redundant change - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache - Stage - Review tweaks - Tweak VH usage in some classes - Logical fallacy - 8160821 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/7bcdcbf3..d49ad129 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=02-03 Stats: 4382 lines in 98 files changed: 2923 ins; 688 del; 771 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From liach at openjdk.org Tue Dec 2 23:25:29 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 23:25:29 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v3] In-Reply-To: References: Message-ID: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28589/files - new: https://git.openjdk.org/jdk/pull/28589/files/69225241..b1d8be39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28589/head:pull/28589 PR: https://git.openjdk.org/jdk/pull/28589 From liach at openjdk.org Tue Dec 2 23:30:13 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 23:30:13 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v4] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 23:20:12 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Rollback getAndAdd for now > - Redundant change > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Stage > - Review tweaks > - Tweak VH usage in some classes > - Logical fallacy > - 8160821 After consulting with @iwanowww, I realized the non-constant status cannot be determined, that the C2 compiled method can even transition from 0 to 1, so I am simplifying this code to only handle the constant case. It seems the getAndAdd IR test no longer fails with this change, and I removed a lot of other redundant changes. I updated the VarHandleExact benchmark added by @JornVernee, and added a case of dropping return values by changing access mode to `getAndAdd` consistently. Now they have the following performance numbers: Benchmark Mode Cnt Score Error Units VarHandleExact.exact_exactInvocation avgt 30 3.843 ? 0.062 ns/op VarHandleExact.generic_exactInvocation avgt 30 3.797 ? 0.049 ns/op VarHandleExact.generic_genericInvocation avgt 30 3.757 ? 0.034 ns/op VarHandleExact.generic_returnDroppingInvocation avgt 30 3.754 ? 0.026 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/28585#issuecomment-3604377750 From dlong at openjdk.org Wed Dec 3 00:07:08 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 3 Dec 2025 00:07:08 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v5] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 13:32:22 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into fix-c2-segfault-unlocknode > - address comments > - fix rename > - rename test file > - Merge branch 'master' into fix-c2-segfault-unlocknode > - fix test spacing > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - copyright format fix? > - 8370502: C2: segfault while adding node to IGVN worklist Yes, it would be good to know if expand_lock_node() also needs a null check. I was assuming the lock and unlock node shapes were basically the same, but now I see that the shapes are different for some reason. The LockNode gets a FastLockNode edge early, while the UnlockNode creates its FastUnlockNode late. I failed to get expand_lock_node() to crash with -XX:+StressMacroExpansion but that doesn't mean there isn't the same problem there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3604455582 From vlivanov at openjdk.org Wed Dec 3 01:42:45 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 3 Dec 2025 01:42:45 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0r bfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> Message-ID: <5CADH75ZjadKttOKwsykRFUPlQKLiwCW8E5WkM_75a4=.fd992c8f-e8bc-4775-9ea3-d5212664e3df@github.com> On Tue, 2 Dec 2025 22:08:20 GMT, Chen Liang wrote: >> I still find it confusing, especially tri-state logic part. >> >> For background, `isCompileConstant` was introduced as part of LF sharing effort to get rid of Java-level profiling in optimized code. The pattern is was designed for was: >> >> if (isCompileConstant(...)) { >> return ...; >> } else { >> ... // do some extra work (either in interpreter, C1, or not-fully-optimized version in C2) >> } >> >> >> In this patch, you don't follow that pattern and aadd new state (`CONSTANT_PENDING`) to distinguish interpreter/C1 from C2. What's the motivation? Why do you want to avoid cache updates coming from C2-generated code? > > I am assuming that if C2 determines this `vh` is not a constant, we can drop it. Is that a right way to move along, or could C2 transition from "not a constant" to "is a constant" during the phases? Sorry, I still don't understand how it is intended to work. Why does `MethodHandleImpl.isCompileConstant(vh) == true` imply that the cached value is compatible with the constant `vh`? // Keep capturing - vh may suddenly get promoted to a constant by C2 Capturing happens outside compiler thread. It is not affected by C2 (except when it completely prunes the whole block). So, either any captured adaptation is valid/compatible or there's a concurrency issue when C2 kicks in and there's a concurrent cache update happening with incompatible version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2583346750 From vlivanov at openjdk.org Wed Dec 3 01:56:27 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 3 Dec 2025 01:56:27 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v4] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 23:20:12 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Rollback getAndAdd for now > - Redundant change > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Stage > - Review tweaks > - Tweak VH usage in some classes > - Logical fallacy > - 8160821 test/micro/org/openjdk/bench/java/lang/invoke/VarHandleExact.java line 81: > 79: > 80: @Benchmark > 81: public void generic_returnDroppingInvocation() { What about "all-generic" case (` { generic.getAndAdd(data, 42); }`)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2583363907 From dlong at openjdk.org Wed Dec 3 02:22:40 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 3 Dec 2025 02:22:40 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> References: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> Message-ID: On Mon, 1 Dec 2025 19:30:22 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Test fix Looks reasonable, but I'm not an expert in this area. src/hotspot/share/opto/parse2.cpp line 1737: > 1735: (*cast_type) = tcon->isa_klassptr()->as_instance_type(); > 1736: return true; // found > 1737: } The old code checked klass_is_exact() for this case, but the new code does not, so was it redundant, given we have a constant? ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28517#pullrequestreview-3532891901 PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2583402219 From dlong at openjdk.org Wed Dec 3 02:37:55 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 3 Dec 2025 02:37:55 GMT Subject: RFR: 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:44:24 GMT, Aleksey Shipilev wrote: > I just noticed (while looking at [JDK-8372800](https://bugs.openjdk.org/browse/JDK-8372800)) that I made a little error in [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) refactor, which made GetAndSet-acquire instruction cost twice as high. The usual cost for acquire versions are twice as low, likely to be selected instead of non-acquire versions. > > This bug happened as I "simplified" stencils at some point by dropping some arguments and renumbering the remaining ones. This is one place where I apparently forgot to renumber one usage. See other checks for `ifelse($3,Acq,...` in that stencil, all of them are `$3` (correct), not `$4` (incorrect). Seen no real bugs because of this mishap, but it would be good to fix it in case we see issues later. I also looked at stencils again, and I think there are no other argument-index problems like this anywhere else. > > The real change is in `aarch64_atomic_ad.m4`, `.ad` is re-generated from that stencil. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, quick jcstress run Oracle testing results look clean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28598#issuecomment-3604789309 From wenanjian at openjdk.org Wed Dec 3 03:23:01 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 3 Dec 2025 03:23:01 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 15:14:09 GMT, Hamlin Li wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> modify label L_EXIT to L_exit > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2636: > >> 2634: void counterMode_AESCrypt(int round, Register in, Register out, Register key, Register counter, >> 2635: Register input_len, Register saved_encrypted_ctr, Register used_ptr) { >> 2636: // Algorithm: > > This should be my last comment :) > Where is this "Algorithm" from? Can you put a link here? Oh sure, when implementing the Algorithm, I mainly referred to the Java code implementation (https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java#L200-L212). besides, I referred to the aarch64 implementation (https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L3190), and made some modifications for RISC-V instructions ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2583489697 From dlong at openjdk.org Wed Dec 3 03:38:59 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 3 Dec 2025 03:38:59 GMT Subject: RFR: 8350208: CTW: GraphKit::add_safepoint_edges asserts "not enough operands for reexecution" In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:30:46 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the issue of the compiler crashing with "not enough operands for reexecution". The issue here is that during `Parse::catch_inline_exceptions`, the old stack is gone, and we cannot reexecute the current bytecode anymore. However, there are some places where we try to insert safepoints into the graph, such as if the handler is a backward jump, or if one of the exceptions in the handlers is not loaded. Since the `_reexecute` state of the current jvms is "undefined", it is inferred automatically that it should reexecute for some bytecodes such as `putfield`. The solution then is to explicitly set `_reexecute` to false. > > I can manage to write a unit test for the case of a backward handler, for the other cases, since the exceptions that can be thrown for a bytecode that is inferred to reexecute are `NullPointerException`, `ArrayIndexOutOfBoundsException`, and `ArrayStoreException`. I find it hard to construct such a test in which one of them is not loaded. > > Please kindly review, thanks a lot. src/hotspot/share/opto/doCall.cpp line 958: > 956: ex_node = use_exception_state(ex_map); > 957: // The stack from before the throwing bytecode is gone, cannot reexecute here > 958: jvms()->set_should_reexecute(false); I agree there are situations where we need to set the reexecute flag explicitly and not base it on the bytecode. I recently fixed JDK-8370766 and filed JDK-8372846 as a followup for similar issues. I need to try out your test to understand this better. Does it cause a backwards-branch safepoint? I suspect that it may not be safe to set rexeecute to false here. If reexecute is false and -XX:+VerifyStack is set, deoptimization may fail if the operands are not on the stack. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28597#discussion_r2583510384 From wenanjian at openjdk.org Wed Dec 3 03:44:27 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 3 Dec 2025 03:44:27 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v30] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: - Merge branch 'openjdk:master' into aes_ctr - modify label L_EXIT to L_exit - add more comments for key value 52 - update some comments, names and Pseudocode - modify stub_id name - Merge branch 'openjdk:master' into aes_ctr - modify format - add more comments - modify parm to unsigned as aarch64 and x86 - clean comments and format - ... and 21 more: https://git.openjdk.org/jdk/compare/530493fe...98d802d5 ------------- Changes: https://git.openjdk.org/jdk/pull/25281/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=29 Stats: 239 lines in 2 files changed: 230 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From liach at openjdk.org Wed Dec 3 04:13:55 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Dec 2025 04:13:55 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <5CADH75ZjadKttOKwsykRFUPlQKLiwCW8E5WkM_75a4=.fd992c8f-e8bc-4775-9ea3-d5212664e3df@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0r bfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> <5CADH75ZjadKttOKwsykRFUPlQKLiwCW8E5WkM_75a4=.fd992c8f-e8bc-4775-9ea3-d5212664e3df@github.com> Message-ID: On Wed, 3 Dec 2025 01:40:29 GMT, Vladimir Ivanov wrote: > any captured adaptation is valid/compatible Yes, if `vh` is a constant, any captured adaptation from `vh.getMethodHandle(mode).asType(symbolicMethodTypeInvoker)` is valid/compatible. For thread safety, MethodHandle supports safe publication, so I think we are fine publishing this way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2583556067 From liach at openjdk.org Wed Dec 3 04:13:59 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Dec 2025 04:13:59 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v4] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 01:53:13 GMT, Vladimir Ivanov wrote: >> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Rollback getAndAdd for now >> - Redundant change >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache >> - Stage >> - Review tweaks >> - Tweak VH usage in some classes >> - Logical fallacy >> - 8160821 > > test/micro/org/openjdk/bench/java/lang/invoke/VarHandleExact.java line 81: > >> 79: >> 80: @Benchmark >> 81: public void generic_returnDroppingInvocation() { > > What about "all-generic" case (` { generic.getAndAdd(data, 42); }`)? I can change the `generic_genericInvocation` to an all-generic case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2583556794 From epeter at openjdk.org Wed Dec 3 05:46:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Dec 2025 05:46:05 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v4] In-Reply-To: References: Message-ID: <9mnRXpB16Y6Mw0TSGFJz-69m24lzCNPMC_B1_YseD4M=.be94bbba-88ce-4958-a8bd-89862d7ec2e7@github.com> On Tue, 2 Dec 2025 09:49:29 GMT, Roland Westrelin wrote: >> The test case has an out of loop `Store` with an `AddP` address >> expression that has other uses and is in the loop body. Schematically, >> only showing the address subgraph and the bases for the `AddP`s: >> >> >> Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> CastPP#110 >> >> >> Both `AddP`s have the same base, a `CastPP` that's also in the loop >> body. >> >> That loop is a counted loop and only has 3 iterations so is fully >> unrolled. First, one iteration is peeled: >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> The `AddP`s and `CastPP` are cloned (because in the loop body). As >> part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is >> called. It finds the test that guards `CastPP#283` in the peeled >> iteration dominates and replaces the test that guards `CastPP#110` >> (the test in the peeled iteration is the clone of the test in the >> loop). That causes `CastPP#110`'s control to be updated to that of the >> test in the peeled iteration and to be yanked from the loop. So now >> `CastPP#283` and `CastPP#110` have the same inputs. >> >> Next unrolling happens: >> >> >> /-> CastPP#110 >> /-> AddP#400 -> AddP#401 -> CastPP#110 >> Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 >> \ -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> `AddP`s are cloned once more but not the `CastPP`s because they are >> both in the peeled iteration now. A new `Phi` is added. >> >> Next igvn runs. It's going to push the `AddP`s through the `Phi`s. >> >> Through `Phi#477`: >> >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 >> \ -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> Through `Phi#360`: >> >> >> /-> AddP#134 -> CastPP#110 >> /-> Phi#509 -> AddP#401 -> CastPP#110 >> Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 >> -> Phi#514 -> CastPP#283 >> ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - more > - review > - Merge branch 'master' into JDK-8351889 > - exp > - Merge branch 'master' into JDK-8351889 > - verif > - Merge branch 'master' into JDK-8351889 > - test seed > - more > - Merge branch 'master' into JDK-8351889 > - ... and 4 more: https://git.openjdk.org/jdk/compare/d6d17aab...15c17bb1 I think I'm on board with the solution now. It is probably best to do it during IGVN. I have a few more suggestions below :) src/hotspot/share/opto/cfgnode.cpp line 2171: > 2169: !wait_for_region_igvn(phase)) { > 2170: // If one of the inputs is a cast that has yet to be processed by igvn, delay processing of this node to give the > 2171: // inputs a chance to optimize and possibly end up with identical inputs. I think we should have more detail here. Why is this a good idea? Is this an optimization? Or is it for correctness? I think you should say something about possibly having multiple cast nodes that could be commoned, and then they would keep their ctrl. But if we uncast, then we lose the info about the ctrl, and below we insert a new cast with a different (later) ctrl. This has two downsides: - The ctrl is later than necessary: suboptimal - If we have 3 or more copies of casts with the same ctrl, and now we remove two and create a new one with a different ctrl, then the remaining old and the new cast cannot common because they have different ctrl. - this suboptimal - this also creates issues along AddP paths: it can be that at some AddP we get one cast and at another AddP a different cast. They all come from the same original base address, just casted differently. But it makes it difficult to check consistency, and asserts fail. This is not very concise yet, you can probably formulate it in a better way ;) src/hotspot/share/opto/phaseX.cpp line 2076: > 2074: if (addp->in(AddPNode::Base) == n->in(AddPNode::Base)) { > 2075: return false; > 2076: } Suggestion: if (!addp->is_AddP() || addp->in(AddPNode::Base)->is_top() || addp->in(AddPNode::Base) == n->in(AddPNode::Base)) { return false; } test/hotspot/jtreg/compiler/c2/TestMismatchedAddPAfterMaxUnroll.java line 35: > 33: * -XX:+StressIGVN TestMismatchedAddPAfterMaxUnroll > 34: * @run main/othervm TestMismatchedAddPAfterMaxUnroll > 35: */ What about a run with our new fancy flag `-XX:VerifyIterativeGVN=10000`? ------------- PR Review: https://git.openjdk.org/jdk/pull/25386#pullrequestreview-3533246147 PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2583686701 PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2583690118 PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2583697429 From epeter at openjdk.org Wed Dec 3 05:46:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Dec 2025 05:46:06 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v4] In-Reply-To: <9mnRXpB16Y6Mw0TSGFJz-69m24lzCNPMC_B1_YseD4M=.be94bbba-88ce-4958-a8bd-89862d7ec2e7@github.com> References: <9mnRXpB16Y6Mw0TSGFJz-69m24lzCNPMC_B1_YseD4M=.be94bbba-88ce-4958-a8bd-89862d7ec2e7@github.com> Message-ID: <9HDQAMPQo9VBlnXt7WpTjK51AcNHwOfHxc4t9YyBCxc=.818e962a-8898-41a1-920d-58444d70961b@github.com> On Wed, 3 Dec 2025 05:36:52 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: >> >> - more >> - review >> - Merge branch 'master' into JDK-8351889 >> - exp >> - Merge branch 'master' into JDK-8351889 >> - verif >> - Merge branch 'master' into JDK-8351889 >> - test seed >> - more >> - Merge branch 'master' into JDK-8351889 >> - ... and 4 more: https://git.openjdk.org/jdk/compare/d6d17aab...15c17bb1 > > src/hotspot/share/opto/phaseX.cpp line 2076: > >> 2074: if (addp->in(AddPNode::Base) == n->in(AddPNode::Base)) { >> 2075: return false; >> 2076: } > > Suggestion: > > if (!addp->is_AddP() || > addp->in(AddPNode::Base)->is_top() || > addp->in(AddPNode::Base) == n->in(AddPNode::Base)) { > return false; > } It could be a bit compacted. How do you imagine `verify_node_invariants_for` will grow over time? I suspect it will become a laundry-list of invariants, we continue going down through it as long as no invariant is violated. For that, it may make more sense to invert your condition, and assert/print inside the if-block. It would make the code more extendable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2583696057 From epeter at openjdk.org Wed Dec 3 05:52:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Dec 2025 05:52:02 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v8] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Mon, 1 Dec 2025 13:39:09 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Review comments resolutions @jatin-bhateja Testing passed, fix looks good to me. Thanks for working on this ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28533#pullrequestreview-3533284691 From jbhateja at openjdk.org Wed Dec 3 06:26:58 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 3 Dec 2025 06:26:58 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v8] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: <2jl8sF9HfU5rPWTDi_jcl4vz6PjIxcdAU4bBwR1sb6c=.1f40d642-5319-4b0c-9505-9bed1a17aecd@github.com> On Tue, 2 Dec 2025 13:32:18 GMT, Emanuel Peter wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Review comments resolutions > > Testing submitted! Code looks good to me :) Thanks @eme64 , integrating now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28533#issuecomment-3605288681 From epeter at openjdk.org Wed Dec 3 06:32:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Dec 2025 06:32:57 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v8] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Mon, 1 Dec 2025 13:39:09 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Review comments resolutions Oh, a second review would be required though! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28533#issuecomment-3605301206 From fyang at openjdk.org Wed Dec 3 07:06:05 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 3 Dec 2025 07:06:05 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> Message-ID: On Mon, 1 Dec 2025 15:13:13 GMT, Hamlin Li wrote: >> Hi, >> >> This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. >> >> This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. >> >> Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> Column names meanings: >> * p: with patch >> * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> * m: without patch >> * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> >> #### Average improvement >> >> NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. >> >> For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) >> -- | -- | -- | -- >> 1.022782609 | 2.198717391 | 2.162673913 | 2.199 >> >> > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - remove log_warning > - add test cases: BoolTest::ge/gt in enc_cmove_fp_cmp_fp Latest version seems fine to me. Thanks for the update. As we are very close to JDK 26 rampdown (2025/12/04), I suggest we postpone this to JDK 27. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28309#pullrequestreview-3533498917 From mhaessig at openjdk.org Wed Dec 3 07:17:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 3 Dec 2025 07:17:56 GMT Subject: RFR: 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:44:24 GMT, Aleksey Shipilev wrote: > I just noticed (while looking at [JDK-8372800](https://bugs.openjdk.org/browse/JDK-8372800)) that I made a little error in [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) refactor, which made GetAndSet-acquire instruction cost twice as high. The usual cost for acquire versions are twice as low, likely to be selected instead of non-acquire versions. > > This bug happened as I "simplified" stencils at some point by dropping some arguments and renumbering the remaining ones. This is one place where I apparently forgot to renumber one usage. See other checks for `ifelse($3,Acq,...` in that stencil, all of them are `$3` (correct), not `$4` (incorrect). Seen no real bugs because of this mishap, but it would be good to fix it in case we see issues later. I also looked at stencils again, and I think there are no other argument-index problems like this anywhere else. > > The real change is in `aarch64_atomic_ad.m4`, `.ad` is re-generated from that stencil. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, quick jcstress run Thank you for fixing this, @shipilev. This looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28598#pullrequestreview-3533532583 From thartmann at openjdk.org Wed Dec 3 07:24:06 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 3 Dec 2025 07:24:06 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v7] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:13:38 GMT, Roland Westrelin wrote: >> In test cases, `mh` is initially not constant so the method handle >> invoke can't be inlined. It is later found to be constant, so it can >> be turned into a direct call by >> `Compile::process_late_inline_calls_no_inline()`. In the meantime, the >> `CallNode` for the mh invoke is cloned (by loop switching). In the >> process, only a shallow copy of the `JVMState` for the call is >> made. The initial `CallNode` is the first to be processed by >> `Compile::process_late_inline_calls_no_inline()` and that causes that >> `CallNode` to become dead. The cloned `CallNode` is then >> processed. The `JVMState` for that one references the initial >> `CallNode` in its caller's `JVMState`. Because that node is dead, that >> causes a crash. The fix I propose is to make a deep copy of the >> `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is >> assigned to the node. >> >> The other failure I see with these tests is: >> >> >> # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 >> # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! >> >> >> because even though the `CallNode` is cloned, there's still only one >> late inline recorded. The fix here is to increment >> `_number_of_mh_late_inlines` when the node is cloned. >> >> This was reported by the netty developers. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370939 > - Merge branch 'master' into JDK-8370939 > - review > - Merge branch 'master' into JDK-8370939 > - review > - more > - more > - more > - more > - test > - ... and 1 more: https://git.openjdk.org/jdk/compare/27065cb8...64b11e6e All testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28088#issuecomment-3605438298 From epeter at openjdk.org Wed Dec 3 08:01:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Dec 2025 08:01:06 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> Message-ID: <4nP7XPYsi87jqsutMCdufFx4Jz6aa-X_pPpjd_uGoG0=.8ce4119b-c3bb-4cc3-b714-ccbeb9ac7f42@github.com> On Mon, 1 Dec 2025 15:13:13 GMT, Hamlin Li wrote: >> Hi, >> >> This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. >> >> This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. >> >> Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> Column names meanings: >> * p: with patch >> * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> * m: without patch >> * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> >> #### Average improvement >> >> NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. >> >> For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) >> -- | -- | -- | -- >> 1.022782609 | 2.198717391 | 2.162673913 | 2.199 >> >> > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - remove log_warning > - add test cases: BoolTest::ge/gt in enc_cmove_fp_cmp_fp I can help review this as well, but currently there is a lot going on with JDK26 bugs. Hopefully things settle down in a few weeks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28309#issuecomment-3605543510 From mhaessig at openjdk.org Wed Dec 3 08:09:05 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 3 Dec 2025 08:09:05 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v5] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 13:32:22 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into fix-c2-segfault-unlocknode > - address comments > - fix rename > - rename test file > - Merge branch 'master' into fix-c2-segfault-unlocknode > - fix test spacing > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - copyright format fix? > - 8370502: C2: segfault while adding node to IGVN worklist Testing passed up to tier3 on linux-x64-debug, linux-aarch64-debug, macosx-x64-debug, macosx-aarch64-debug, and windows-x64-debug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3605568889 From thartmann at openjdk.org Wed Dec 3 08:53:33 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 3 Dec 2025 08:53:33 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory [v2] In-Reply-To: References: <3SFe0aKR8DW5SKjr375S78OWgJS7g2pLZfepb43yISI=.958eda85-ca1a-4f85-a9a2-c7ad60dcc025@github.com> Message-ID: On Tue, 2 Dec 2025 15:28:05 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/stubDeclarations_x86.hpp line 76: >> >>> 74: do_arch_entry, \ >>> 75: do_arch_entry_init) \ >>> 76: do_arch_blob(compiler, 120000 WINDOWS_ONLY(+2000)) \ >> >> I was wondering if there are any reason for this value (apart that it is enough for the test to pass. I just noticed that it has been increased already in the past). > > The assert was suggesting 119k (and change..) so I rounded slightly up. I was going to ask (i.e. @TobiHartmann ?) if thats enough.. > > (Similarly, I am concerned that I am contributing to a larger JVM footprint, with my changes.. but I suppose 11k is comparatively insignificant in the grand scheme of things...) > > Thanks for the review! I think that's a reasonable increase. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28588#discussion_r2584174950 From mli at openjdk.org Wed Dec 3 09:23:01 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Dec 2025 09:23:01 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 03:20:34 GMT, Anjian Wen wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2636: >> >>> 2634: void counterMode_AESCrypt(int round, Register in, Register out, Register key, Register counter, >>> 2635: Register input_len, Register saved_encrypted_ctr, Register used_ptr) { >>> 2636: // Algorithm: >> >> This should be my last comment :) >> Where is this "Algorithm" from? Can you put a link here? > > Oh sure, when implementing the Algorithm, I mainly referred to the Java code implementation (https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java#L200-L212). besides, I referred to the aarch64 implementation (https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L3190), and made some modifications for RISC-V instructions Thanks! If this C style code is based on the java one, can you add a reference here to the java code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584282181 From wenanjian at openjdk.org Wed Dec 3 09:51:07 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 3 Dec 2025 09:51:07 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 09:20:20 GMT, Hamlin Li wrote: >> Oh sure, when implementing the Algorithm, I mainly referred to the Java code implementation (https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java#L200-L212). besides, I referred to the aarch64 implementation (https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L3190), and made some modifications for RISC-V instructions > > Thanks! > If this C style code is based on the java one, can you add a reference here to the java code? It's for future reference. Do you mean adding a comment like ?mainly according to com.sun.crypto.provider.CounterMode::implCrypt? here? I may not have described it clearly, the java one(https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java#L200-L212) I referred to is the function for which we try to implement its intrinsics, do we still need add a reference? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584381894 From mli at openjdk.org Wed Dec 3 09:56:18 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Dec 2025 09:56:18 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 09:48:47 GMT, Anjian Wen wrote: >> Thanks! >> If this C style code is based on the java one, can you add a reference here to the java code? It's for future reference. > > Do you mean adding a comment like ?mainly according to com.sun.crypto.provider.CounterMode::implCrypt? here? > > I may not have described it clearly, the java one(https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java#L200-L212) I referred to is the function for which we try to implement its intrinsics, do we still need add a reference? I mean, where is the C code from? we'd better put a reference here to point to it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584400100 From mli at openjdk.org Wed Dec 3 09:59:13 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Dec 2025 09:59:13 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 09:53:20 GMT, Hamlin Li wrote: >> Do you mean adding a comment like ?mainly according to com.sun.crypto.provider.CounterMode::implCrypt? here? >> >> I may not have described it clearly, the java one(https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java#L200-L212) I referred to is the function for which we try to implement its intrinsics, do we still need add a reference? > > I mean, where is the C code from? we'd better put a reference here to point to it. I assume your assembly code is kind of translation from a high language code, or maybe I misunderstood it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584412804 From wenanjian at openjdk.org Wed Dec 3 10:07:33 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 3 Dec 2025 10:07:33 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 09:56:45 GMT, Hamlin Li wrote: >> I mean, where is the C code from? we'd better put a reference here to point to it. > > I assume your assembly code is kind of translation from a high language code, or maybe I misunderstood it? Oh?it is a pseudo code I created for easily understand ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584443250 From mli at openjdk.org Wed Dec 3 10:13:43 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Dec 2025 10:13:43 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 10:04:15 GMT, Anjian Wen wrote: >> I assume your assembly code is kind of translation from a high language code, or maybe I misunderstood it? > > Oh?it is a pseudo code I created for easily understand is riscv assembly a migration from aarch64? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584466181 From qamai at openjdk.org Wed Dec 3 10:24:50 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 3 Dec 2025 10:24:50 GMT Subject: RFR: 8350208: CTW: GraphKit::add_safepoint_edges asserts "not enough operands for reexecution" In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 03:36:13 GMT, Dean Long wrote: >> Hi, >> >> This PR fixes the issue of the compiler crashing with "not enough operands for reexecution". The issue here is that during `Parse::catch_inline_exceptions`, the old stack is gone, and we cannot reexecute the current bytecode anymore. However, there are some places where we try to insert safepoints into the graph, such as if the handler is a backward jump, or if one of the exceptions in the handlers is not loaded. Since the `_reexecute` state of the current jvms is "undefined", it is inferred automatically that it should reexecute for some bytecodes such as `putfield`. The solution then is to explicitly set `_reexecute` to false. >> >> I can manage to write a unit test for the case of a backward handler, for the other cases, since the exceptions that can be thrown for a bytecode that is inferred to reexecute are `NullPointerException`, `ArrayIndexOutOfBoundsException`, and `ArrayStoreException`. I find it hard to construct such a test in which one of them is not loaded. >> >> Please kindly review, thanks a lot. > > src/hotspot/share/opto/doCall.cpp line 958: > >> 956: ex_node = use_exception_state(ex_map); >> 957: // The stack from before the throwing bytecode is gone, cannot reexecute here >> 958: jvms()->set_should_reexecute(false); > > I agree there are situations where we need to set the reexecute flag explicitly and not base it on the bytecode. I recently fixed JDK-8370766 and filed JDK-8372846 as a followup for similar issues. I need to try out your test to understand this better. Does it cause a backwards-branch safepoint? I suspect that it may not be safe to set rexeecute to false here. If reexecute is false and -XX:+VerifyStack is set, deoptimization may fail if the operands are not on the stack. Yes, it is a backwards-branch safepoint. Tbh, after looking deeper, I don't really understand what is happening here. I modified the test a little bit so the final compiled code does not elide the safepoint in the loop, and ran with `-XX:+VerifyStack -XX:+DeoptimizeALot -XX:+SafepointALot`, but the test still passed after 100 repeats. I think that the state is correct, but I don't see how the compiled code notifies the deoptimizater and the interpreter that it is in an exception state, and the interpreter needs to find an exception handler instead of continuing with the next bytecode. My guess is that the compiled code should store the exception into `Thread::_pending_exception`, or the deoptimizer needs to do so, and the interpreter needs to check that when being handed the control. But I have not yet found that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28597#discussion_r2584505420 From rrich at openjdk.org Wed Dec 3 10:29:24 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 3 Dec 2025 10:29:24 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v4] In-Reply-To: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> References: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> Message-ID: On Thu, 20 Nov 2025 10:21:34 GMT, Richard Reingruber wrote: >> With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. >> >> It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. >> >> The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. >> >> The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. >> Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. >> >> So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) >> >> There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. >> >> Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. >> >> ##### Testing with fastdebug builds on AARCH64 and PPC64: >> >> hotspot_vector_1 >> hotspot_vector_2 >> jdk_vector >> jdk_vector_sanity >> >> ##### The change passed our CI testing: >> Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: >> >> compiler/vectorapi/VectorRearrangeTest.java >> jdk/incubator/vector/Byte128VectorLoadStoreTests.java >> jdk/incubator/vector/Double256VectorLoadStoreTests.java >> jdk/incubator/vector/Float128VectorTests.java >> jdk/incubator/vector/Long256VectorLoadStoreTests.java >> jdk/incubator/vector/Short128VectorLoadStoreTests.java >> jdk/incubator/vector/Vector64ConversionTests.java > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' > - Exclude IR check on riscv with rvv > - Enhance comment > - Fix OptoAssembly for Power 8 > - PPC: OptoAssembly for vector spilling > - Assert aligned sp offsets in vector spilling > - Delete TMP and !UseNewCode > - Align Matcher::_new_SP for better vector spilling > - TMP: trace unaligned vector spilling > - Add test Thanks again for the feedback. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27969#issuecomment-3606133505 From wenanjian at openjdk.org Wed Dec 3 10:29:25 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 3 Dec 2025 10:29:25 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 10:10:50 GMT, Hamlin Li wrote: >> Oh?it is a pseudo code I created for easily understand > > is riscv assembly a migration from aarch64? I mainly follow the AES standard algorithm, but I did refer to the implementation of AArch64. I'm not sure if it can be described with migration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584518540 From rrich at openjdk.org Wed Dec 3 10:32:25 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 3 Dec 2025 10:32:25 GMT Subject: Integrated: 8370473: C2: Better Aligment of Vector Spill Slots In-Reply-To: References: Message-ID: <7kWVlFj8b6kCAGo2YRKoW39R36PYm3pb88zCPVFJM9o=.682a51a9-f924-45d0-b0a8-ba4f8df16b92@github.com> On Fri, 24 Oct 2025 07:36:57 GMT, Richard Reingruber wrote: > With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. > > It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. > > The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. > > The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. > Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. > > So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) > > There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. > > Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. > > ##### Testing with fastdebug builds on AARCH64 and PPC64: > > hotspot_vector_1 > hotspot_vector_2 > jdk_vector > jdk_vector_sanity > > ##### The change passed our CI testing: > Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: > > compiler/vectorapi/VectorRearrangeTest.java > jdk/incubator/vector/Byte128VectorLoadStoreTests.java > jdk/incubator/vector/Double256VectorLoadStoreTests.java > jdk/incubator/vector/Float128VectorTests.java > jdk/incubator/vector/Long256VectorLoadStoreTests.java > jdk/incubator/vector/Short128VectorLoadStoreTests.java > jdk/incubator/vector/Vector64ConversionTests.java This pull request has now been integrated. Changeset: 804ce0a2 Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/804ce0a2394cb3f837441976e5ef6eb4b9cab257 Stats: 203 lines in 7 files changed: 157 ins; 29 del; 17 mod 8370473: C2: Better Aligment of Vector Spill Slots Reviewed-by: goetz, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/27969 From thartmann at openjdk.org Wed Dec 3 10:33:11 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 3 Dec 2025 10:33:11 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 15:40:51 GMT, Volodymyr Paprotski wrote: >> Requires a Broadwell machine, but was able to reproduce with an emulator: >> >> >> ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi >> ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Manuel Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28588#pullrequestreview-3534338808 From pminborg at openjdk.org Wed Dec 3 10:34:41 2025 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 3 Dec 2025 10:34:41 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: <4gKSL8hFAE2qSuTmhJa6JMfoB6JfUnK9fzwHAnH2Zzg=.9fc69461-bbe7-4242-b3b1-b4b004f35ce0@github.com> On Tue, 2 Dec 2025 17:24:41 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2036: >> >>> 2034: var constant = MethodHandleImpl.isCompileConstant(vh); >>> 2035: var cache = adaptedMh; >>> 2036: if (constant == MethodHandleImpl.CONSTANT_YES && cache != null) { >> >> Rookie question: Is there multi-thread considerations here? How about visibility across threads? > > MethodHandle is immutable and can be safely published. So this is ok. I meant that even though objects are immutable, plain semantics might not always do. Reference: https://shipilev.net/blog/2014/safe-public-construction/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2584535309 From qamai at openjdk.org Wed Dec 3 10:45:39 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 3 Dec 2025 10:45:39 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> References: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> Message-ID: On Mon, 1 Dec 2025 19:30:22 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Test fix Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28517#pullrequestreview-3534403006 From qamai at openjdk.org Wed Dec 3 10:45:41 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 3 Dec 2025 10:45:41 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: References: Message-ID: <4abJMXdHzqKGqU58EXHaXO7849B0a64NoShEvU110I4=.87a93e5c-b73e-4c6e-b85b-8797eea8814d@github.com> On Mon, 1 Dec 2025 19:49:29 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/opto/parse2.cpp line 1739: >> >>> 1737: } >>> 1738: >>> 1739: // Match an instanceof check. >> >> We seem to require that the input of `SubTypeCheck` is not `null`. What do you think about allowing `SubTypeCheck` to accept `null` and return `false`? > > Yes, it's a good idea and the right direction to move. While experimenting with a different enhancement, I noticed that a subtype check leaves a null check behind irrespective of whether the check goes away or not. > > Unfortunately, there are some engineering considerations which complicates the change. `SubTypeCheck` is shared across all the places where subtype checks are performed, but `checkcast` and `instanceof` differ in the way `null` is handled. So, the proper way to fix it is to introduce a higher-level representation which implicitly handles nulls and then eventually lower it to `SubTypeCheck` and materialize null check if needed. There are multiple ways without having to have yet another higher-level representation. The first one is that since `SubTypeCheck` does not accept `null` now, we can just choose one result for `null`. Choosing the `instanceof` approach may be a little more desirable, as it removes the need to perform this complicated match, and for `checkcast` we can manually insert a `CheckCastPP` anyway. Another solution is to have another input to `SubTypeCheck` which gives the result when the `obj` is `null`. On a whim, I kind of like this, as we can match both the `checkcast` and the `instanceof` pattern here, it also simplifies `GraphKit::gen_checkcast`, as we do not have to worry about "the cast that always succeeds will leave behind a null check". Just a suggestion, though. This PR is fine as it is to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2584574037 From shade at openjdk.org Wed Dec 3 10:58:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Dec 2025 10:58:01 GMT Subject: Integrated: 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 In-Reply-To: References: Message-ID: <82avOvCMGtKCw3qqonnTCa3r3G0wT_qEv7f-nuYQn38=.0602395f-f22f-4b07-8bf7-e1cec5b0cd1a@github.com> On Tue, 2 Dec 2025 10:44:24 GMT, Aleksey Shipilev wrote: > I just noticed (while looking at [JDK-8372800](https://bugs.openjdk.org/browse/JDK-8372800)) that I made a little error in [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) refactor, which made GetAndSet-acquire instruction cost twice as high. The usual cost for acquire versions are twice as low, likely to be selected instead of non-acquire versions. > > This bug happened as I "simplified" stencils at some point by dropping some arguments and renumbering the remaining ones. This is one place where I apparently forgot to renumber one usage. See other checks for `ifelse($3,Acq,...` in that stencil, all of them are `$3` (correct), not `$4` (incorrect). Seen no real bugs because of this mishap, but it would be good to fix it in case we see issues later. I also looked at stencils again, and I think there are no other argument-index problems like this anywhere else. > > The real change is in `aarch64_atomic_ad.m4`, `.ad` is re-generated from that stencil. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, quick jcstress run This pull request has now been integrated. Changeset: 3f447edf Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/3f447edf0e22431628ebb74212f760209ea29d37 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 Reviewed-by: dlong, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/28598 From shade at openjdk.org Wed Dec 3 10:58:00 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Dec 2025 10:58:00 GMT Subject: RFR: 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:44:24 GMT, Aleksey Shipilev wrote: > I just noticed (while looking at [JDK-8372800](https://bugs.openjdk.org/browse/JDK-8372800)) that I made a little error in [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) refactor, which made GetAndSet-acquire instruction cost twice as high. The usual cost for acquire versions are twice as low, likely to be selected instead of non-acquire versions. > > This bug happened as I "simplified" stencils at some point by dropping some arguments and renumbering the remaining ones. This is one place where I apparently forgot to renumber one usage. See other checks for `ifelse($3,Acq,...` in that stencil, all of them are `$3` (correct), not `$4` (incorrect). Seen no real bugs because of this mishap, but it would be good to fix it in case we see issues later. I also looked at stencils again, and I think there are no other argument-index problems like this anywhere else. > > The real change is in `aarch64_atomic_ad.m4`, `.ad` is re-generated from that stencil. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, quick jcstress run Thanks for reviews! I am integrating this now to get it in cleanly before RDP1 cutoff :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28598#issuecomment-3606263998 From wenanjian at openjdk.org Wed Dec 3 11:01:58 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 3 Dec 2025 11:01:58 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 10:26:16 GMT, Anjian Wen wrote: >> is riscv assembly a migration from aarch64? > > I mainly follow the AES standard algorithm, but I did refer to the implementation of AArch64. I'm not sure if it can be described with migration. In addition, there is a difference compared with aarch64. the Algorithm in aarch64 has an extra large block optimization branch. it calculate 4 blocks in one loop, which seems to make the code more cache friendly, but add more control flow and use more vector register. I think maybe we can do this kind of optimization when we can test on a real machine later ? I just support the standard algorithm currently. > is riscv assembly a migration from aarch64? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584647886 From shade at openjdk.org Wed Dec 3 11:15:44 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Dec 2025 11:15:44 GMT Subject: RFR: 8351844: C2 x64 AVX2 vpminmax assertion failure with equivalent inputs In-Reply-To: References: Message-ID: <4jqjXLkV2LwlS1HRlb2fFIJhO-jU6C2_yVWyiB9z2ZI=.208e3745-23b8-466e-9ea9-df42e49119a4@github.com> On Tue, 2 Dec 2025 11:10:08 GMT, Jatin Bhateja wrote: > Bug fix PR fixes an incorrect register equivalence in macro assembler. MaxV/MinV IR with equivalent inputs should ideally be removed from ideal graph before reaching to macro assembler. [JDK-8372797](https://bugs.openjdk.org/browse/JDK-8372797) is filed to add relevant identity transformations. > > Best Regards, > Jatin The product fix looks reasonable. The asserts need to verify that write to `dst` does not yet destroy either of `src`-s. We don't need to check that `src`-s are actually distinct. I have test comments/questions: test/hotspot/jtreg/compiler/vectorapi/TestVectorMinMaxSameInputs.java line 44: > 42: > 43: public static void main(String[] args) { > 44: TestFramework.runWithFlags("--add-modules=jdk.incubator.vector", "-ea", "-XX:+IgnoreUnrecognizedVMOptions", "-XX:UseAVX=2"); I understand `-XX:UseAVX=2` is here to hit the path where the assert is on. But for a generic test like this, it would seem unwise to limit the test configuration only to AVX=2. I would expect we instead run the tests with `TEST_VM_OPTS=-XX:UseAVX=2` to confirm they work with AVX=2 even on AVX-512 machines. test/hotspot/jtreg/compiler/vectorapi/TestVectorMinMaxSameInputs.java line 58: > 56: > 57: @Test > 58: @IR(counts={IRNode.MAX_VL, "1"}) In other tests, I see we are actually checking for CPU feature flags before assuming these nodes are present: @Test @IR(applyIfCPUFeatureOr = { "sse4.1", "true" , "asimd" , "true", "rvv", "true"}, counts = { IRNode.MAX_VL, "> 0" }) So this test would probably fail on some older hardware and/or with some configuration options? ------------- PR Review: https://git.openjdk.org/jdk/pull/28600#pullrequestreview-3534540911 PR Review Comment: https://git.openjdk.org/jdk/pull/28600#discussion_r2584692296 PR Review Comment: https://git.openjdk.org/jdk/pull/28600#discussion_r2584688358 From duke at openjdk.org Wed Dec 3 11:15:52 2025 From: duke at openjdk.org (duke) Date: Wed, 3 Dec 2025 11:15:52 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v4] In-Reply-To: <5P58y7mFExd-rdT_nGu_Ky0UG-vDGPRG2IycLX6xwIY=.403c2f90-1ab3-4096-80a7-b80d819d3ca9@github.com> References: <5P58y7mFExd-rdT_nGu_Ky0UG-vDGPRG2IycLX6xwIY=.403c2f90-1ab3-4096-80a7-b80d819d3ca9@github.com> Message-ID: On Fri, 28 Nov 2025 09:40:25 GMT, Galder Zamarre?o wrote: >> Trivial cleanup to move tests out of a test class whose description does not match these tests > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/gcbarriers/TestMinMaxLongLoopBarrier.java > > Co-authored-by: Emanuel Peter @galderz Your change (at version d023353faf7220920ea1434756d822361ebe4032) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28385#issuecomment-3606348207 From krk at openjdk.org Wed Dec 3 12:25:15 2025 From: krk at openjdk.org (Kerem Kat) Date: Wed, 3 Dec 2025 12:25:15 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v5] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 13:32:22 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into fix-c2-segfault-unlocknode > - address comments > - fix rename > - rename test file > - Merge branch 'master' into fix-c2-segfault-unlocknode > - fix test spacing > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - copyright format fix? > - 8370502: C2: segfault while adding node to IGVN worklist Thanks! Cut these issues for tracking: [JDK-8373011](https://bugs.openjdk.org/browse/JDK-8373011) and [JDK-8373012](https://bugs.openjdk.org/browse/JDK-8373012). The latter also covers "it would be good to know if expand_lock_node() also needs a null check". ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3606609512 From galder at openjdk.org Wed Dec 3 12:34:22 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 3 Dec 2025 12:34:22 GMT Subject: Integrated: 8371792: Refactor barrier loop tests out of TestIfMinMax In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 08:30:56 GMT, Galder Zamarre?o wrote: > Trivial cleanup to move tests out of a test class whose description does not match these tests This pull request has now been integrated. Changeset: a655ea48 Author: Galder Zamarre?o Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/a655ea48453a321fb7cadc6ffb6111276497a929 Stats: 123 lines in 2 files changed: 86 ins; 36 del; 1 mod 8371792: Refactor barrier loop tests out of TestIfMinMax Reviewed-by: chagedorn, epeter, bmaillard ------------- PR: https://git.openjdk.org/jdk/pull/28385 From epeter at openjdk.org Wed Dec 3 13:03:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Dec 2025 13:03:02 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account [v2] In-Reply-To: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> Message-ID: > **Summary** > > I created some `fill` and `copy` style benchmarks, covering both `arrays` and `MemorySegment`s. > Reasons for this benchmark: > - I want to compare auto-vectorization with intrinsics (array assembly style intrinsics, and MemorySegment java level special implementations). This allows us to see if some are slower than others, and if we can manage to improve the slower versions somehow in the future. > - There are some known issues we can demonstrate well with this benchmark: > - Super-Unrolling: unrolling the vectoirzed loop gets us extra performance, but the exact factor may not be optimal yet for auto-vectorization. > - Small iteration count loops: auto-vectorization can lead to slowdowns. > - Many benchmarks do not control for alignment. But that creates noise. I just go over all possible alignments, that should smooth out the noise. > - Most benchmarks do not control for 4k aliasing (x86 effect in store buffer). I make sure that load/stores are not a multiple of 4k bytes apart, so we can avoid the noise of that effect. > > ---------------------------------------------------------------------- > > **Analysis based on this Benchmark** > > Analysis done in this PR: > - Arrays: auto vectorization vs scalar loops performance > - Arrays: auto vectorization loops vs intrinsics > - MemorySegments: auto vectorization loops vs scalar loops vs `MemorySegment.fill/copy` > > Future work: > - Investigate deeper, inspect assembly, etc. > - Impact of `-XX:SuperWordAutomaticAlignment=0` on small iteration count loops. > - Investigate effect of `-XX:-OptimizeFill`. It seems that the loops in this benchmark are not detected automatically, and so the array intrinsics are not used. Why? > - Investigate impact of `CompactObjectHeaders`. Does enabling/disabling change any performance? > - Investigate if adjusting the super-unrolling factor could improve performance for auto-vectorization: [JDK-8368061](https://bugs.openjdk.org/browse/JDK-8368061) > - Performance comparison with Graal. > > ---------------------------------------------------------------------- > > **Array Benchmark: auto vectorization vs scalar** > > We can see that for arrays, auto vectorization leads to minor regressions for sizes 1-32, and then generally auto vectorization is faster for larger sizes. And this is true for both `fill` and `copy`. > > Strange: `macosx_aarch64` with `copy_int`. The auto vectoirized performance has a sudden drop around 150 iterations. Also for `fill_long` we have a "phase-transition" around 64, that goes steeper rather... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - small modulo fix from review suggestion - Merge branch 'master' into JDK-8367158-fill-and-copy-benchmarks - more MS types - fix MS fill - more backing types - object array benchmarks - fix bm - ms bm update - clean up benchmark - more types - ... and 6 more: https://git.openjdk.org/jdk/compare/098a7d6e...80378aea ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27315/files - new: https://git.openjdk.org/jdk/pull/27315/files/40a80d79..80378aea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27315&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27315&range=00-01 Stats: 346063 lines in 3520 files changed: 221363 ins; 78352 del; 46348 mod Patch: https://git.openjdk.org/jdk/pull/27315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27315/head:pull/27315 PR: https://git.openjdk.org/jdk/pull/27315 From rsunderbabu at openjdk.org Wed Dec 3 13:19:50 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Wed, 3 Dec 2025 13:19:50 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability Message-ID: Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. MD5 SHA1 SHA256 SHA3 Testing: All flag combinations from CI hotspot tiers 1 to 5 PS: only for tier testings, mac-aarch was skipped due to resource constraints ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/28634/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28634&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372941 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28634/head:pull/28634 PR: https://git.openjdk.org/jdk/pull/28634 From jvernee at openjdk.org Wed Dec 3 13:26:28 2025 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 3 Dec 2025 13:26:28 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0r bfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> <5CADH75ZjadKttOKwsykRFUPlQKLiwCW8E5WkM_75a4=.fd992c8f-e8bc-4775-9ea3-d5212664e3df@github.com> Message-ID: <5QPAetQEkrBgFKtMt0i9Ku_4s2GCirMl2uqLH3j8x7g=.e5fc8964-0080-45f7-9005-31922ec06ba1@github.com> On Wed, 3 Dec 2025 04:10:05 GMT, Chen Liang wrote: >> Sorry, I still don't understand how it is intended to work. Why does `MethodHandleImpl.isCompileConstant(vh) == true` imply that the cached value is compatible with the constant `vh`? >> >> >> // Keep capturing - vh may suddenly get promoted to a constant by C2 >> >> >> Capturing happens outside compiler thread. It is not affected by C2 (except when it completely prunes the whole block). >> >> So, either any captured adaptation is valid/compatible or there's a concurrency issue when C2 kicks in and there's a concurrent cache update happening with incompatible version. > >> any captured adaptation is valid/compatible > > Yes, if `vh` is a constant, any captured adaptation from `vh.getMethodHandle(mode).asType(symbolicMethodTypeInvoker)` is valid/compatible. > > For thread safety, MethodHandle supports safe publication, so I think we are fine publishing this way. Looking at this, I'm not sure we can assume that we only see one mode and type when the VH is constant. There seems to be a lot of non-local reasoning involved. For example, you could have a var handle invoker created with `MethodHandless::varHandleInvoker`, which is cached, so the `AccessDescriptor` can be shared among many different use sites. For an individual use-site, the receiver VH may well be a constant, but that doesn't mean that the cache isn't polluted by the var handle from another use site, as far as I can tell. The thread safety issue comes from a C2 thread racing to read the `lastAdaption` cache vs another Java thread writing to the cache. AFAICS, this race is still possible even when `vh` is a compile time constant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2585100537 From mli at openjdk.org Wed Dec 3 13:27:42 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Dec 2025 13:27:42 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v30] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 03:44:27 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: > > - Merge branch 'openjdk:master' into aes_ctr > - modify label L_EXIT to L_exit > - add more comments for key value 52 > - update some comments, names and Pseudocode > - modify stub_id name > - Merge branch 'openjdk:master' into aes_ctr > - modify format > - add more comments > - modify parm to unsigned as aarch64 and x86 > - clean comments and format > - ... and 21 more: https://git.openjdk.org/jdk/compare/530493fe...98d802d5 Looks good, Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25281#pullrequestreview-3535032696 From mli at openjdk.org Wed Dec 3 13:27:43 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Dec 2025 13:27:43 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 10:59:13 GMT, Anjian Wen wrote: > I think maybe we can do this kind of optimization when we can test on a real machine later ? Great! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2585105649 From jvernee at openjdk.org Wed Dec 3 13:37:54 2025 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 3 Dec 2025 13:37:54 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <5QPAetQEkrBgFKtMt0i9Ku_4s2GCirMl2uqLH3j8x7g=.e5fc8964-0080-45f7-9005-31922ec06ba1@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0r bfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> <5CADH75ZjadKttOKwsykRFUPlQKLiwCW8E5WkM_75a4=.fd992c8f-e8bc-4775-9ea3-d5212664e3df@github.com> <5QPAetQEkrBgFKtMt0i9Ku_4s2GCirMl2uqLH3j8x7g=.e5fc8964-0080-45f7-9005-31922ec06ba1@github.com> Message-ID: On Wed, 3 Dec 2025 13:23:18 GMT, Jorn Vernee wrote: >>> any captured adaptation is valid/compatible >> >> Yes, if `vh` is a constant, any captured adaptation from `vh.getMethodHandle(mode).asType(symbolicMethodTypeInvoker)` is valid/compatible. >> >> For thread safety, MethodHandle supports safe publication, so I think we are fine publishing this way. > > Looking at this, I'm not sure we can assume that we only see one mode and type when the VH is constant. There seems to be a lot of non-local reasoning involved. > > For example, you could have a var handle invoker created with `MethodHandless::varHandleInvoker`, which is cached, so the `AccessDescriptor` can be shared among many different use sites. For an individual use-site, the receiver VH may well be a constant, but that doesn't mean that the cache isn't polluted by the var handle from another use site, as far as I can tell. > > The thread safety issue comes from a C2 thread racing to read the `lastAdaption` cache vs another Java thread writing to the cache. AFAICS, this race is still possible even when `vh` is a compile time constant. I think even without using an invoker, you could end up in a similar situation if you have something like: static Object m(VarHandle vh) { return vh.get(); } Which is called by several different threads. At some point this method may be inlined into one of its callees, where `vh` then becomes a constant. But at the same time, other threads are still writing to the cache. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2585142665 From liach at openjdk.org Wed Dec 3 14:11:22 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Dec 2025 14:11:22 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <4gKSL8hFAE2qSuTmhJa6JMfoB6JfUnK9fzwHAnH2Zzg=.9fc69461-bbe7-4242-b3b1-b4b004f35ce0@github.com> References: <4gKSL8hFAE2qSuTmhJa6JMfoB6JfUnK9fzwHAnH2Zzg=.9fc69461-bbe7-4242-b3b1-b4b004f35ce0@github.com> Message-ID: On Wed, 3 Dec 2025 10:31:07 GMT, Per Minborg wrote: >> MethodHandle is immutable and can be safely published. So this is ok. > > I meant that even though objects are immutable, plain semantics might not always do. > > Reference: https://shipilev.net/blog/2014/safe-public-construction/ MethodHandle is safe. All fields in Method Handle hierarchies are either lazy/stable or final. You can refer to the `invokers` field in `MethodType`, and the `MethodHandle` array in `Invokers` for precedents. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2585263336 From liach at openjdk.org Wed Dec 3 14:11:23 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Dec 2025 14:11:23 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <4gKSL8hFAE2qSuTmhJa6JMfoB6JfUnK9fzwHAnH2Zzg=.9fc69461-bbe7-4242-b3b1-b4b004f35ce0@github.com> Message-ID: <4F_HqL-oY7z2ENI9yIAS7VS3NDjEljsqx4E2zK5HxJ0=.8f5ab39a-66e1-4ea3-ace2-226d6bd39d77@github.com> On Wed, 3 Dec 2025 14:06:00 GMT, Chen Liang wrote: >> I meant that even though objects are immutable, plain semantics might not always do. >> >> Reference: https://shipilev.net/blog/2014/safe-public-construction/ > > MethodHandle is safe. All fields in Method Handle hierarchies are either lazy/stable or final. You can refer to the `invokers` field in `MethodType`, and the `MethodHandle` array in `Invokers` for precedents. In extreme cases where a barrier is needed, java.lang.invoke already issue necessary barriers, most notably the storeStoreFence, such as https://github.com/openjdk/jdk/blob/135661b4389663b8c2e348d9e61e72cc628636bb/src/java.base/share/classes/java/lang/invoke/CallSite.java#L138 or https://github.com/openjdk/jdk/blob/135661b4389663b8c2e348d9e61e72cc628636bb/src/java.base/share/classes/java/lang/ClassValue.java#L411-L417 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2585270552 From vpaprotski at openjdk.org Wed Dec 3 14:57:24 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 3 Dec 2025 14:57:24 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 15:40:51 GMT, Volodymyr Paprotski wrote: >> Requires a Broadwell machine, but was able to reproduce with an emulator: >> >> >> ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi >> ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Manuel Thanks for the approvals! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28588#issuecomment-3607240643 From vpaprotski at openjdk.org Wed Dec 3 14:57:25 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 3 Dec 2025 14:57:25 GMT Subject: Integrated: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 21:16:18 GMT, Volodymyr Paprotski wrote: > Requires a Broadwell machine, but was able to reproduce with an emulator: > > > ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi > ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run This pull request has now been integrated. Changeset: 829b8581 Author: Volodymyr Paprotski URL: https://git.openjdk.org/jdk/commit/829b85813a3810eeecf6ce4b30b5c3d1fc34ad23 Stats: 3 lines in 2 files changed: 0 ins; 2 del; 1 mod 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory Reviewed-by: mhaessig, dfenacci, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/28588 From liach at openjdk.org Wed Dec 3 15:54:30 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Dec 2025 15:54:30 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v5] In-Reply-To: References: Message-ID: > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Fix problem identified by Jorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/d49ad129..89e21b4b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=03-04 Stats: 40 lines in 3 files changed: 25 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From dlunden at openjdk.org Wed Dec 3 16:11:47 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 3 Dec 2025 16:11:47 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v8] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: <2z3W0Nhjk7JjhJSw8nTHZJhY6xd8j62vBHp0HkMNQJQ=.9a1f6cf7-ad8d-4599-a873-9b7729c7109b@github.com> On Mon, 1 Dec 2025 13:39:09 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Review comments resolutions Looks good @jatin-bhateja! One minor style consistency suggestion. test/hotspot/jtreg/compiler/vectorapi/TestABSMaskedMaxByteVector.java line 48: > 46: @Test > 47: @IR(failOn = {IRNode.ABS_VB}, applyIfAnd={"MaxVectorSize", " <= 8 ", "UseAVX", "0"}, applyIfPlatform={"x64", "true"}, applyIfCPUFeature={"sse4.1", "true"}) > 48: @IR(counts = {IRNode.ABS_VB, "1"}, applyIf={"MaxVectorSize", " > 8 "}, applyIfPlatform={"x64", "true"}, applyIfCPUFeature={"sse4.1", "true"}) Suggestion: @IR(failOn = {IRNode.ABS_VB}, applyIfAnd = {"MaxVectorSize", " <= 8 ", "UseAVX", "0"}, applyIfPlatform = {"x64", "true"}, applyIfCPUFeature = {"sse4.1", "true"}) @IR(counts = {IRNode.ABS_VB, "1"}, applyIf = {"MaxVectorSize", " > 8 "}, applyIfPlatform = {"x64", "true"}, applyIfCPUFeature = {"sse4.1", "true"}) ------------- Marked as reviewed by dlunden (Committer). PR Review: https://git.openjdk.org/jdk/pull/28533#pullrequestreview-3535790377 PR Review Comment: https://git.openjdk.org/jdk/pull/28533#discussion_r2585713416 From liach at openjdk.org Wed Dec 3 16:43:24 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Dec 2025 16:43:24 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v6] In-Reply-To: References: Message-ID: > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request incrementally with two additional commits since the last revision: - Test from Jorn - Copyright years ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/89e21b4b..ff7b3629 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=04-05 Stats: 107 lines in 3 files changed: 105 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From jbhateja at openjdk.org Wed Dec 3 18:30:50 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 3 Dec 2025 18:30:50 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v9] In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/vectorapi/TestABSMaskedMaxByteVector.java Co-authored-by: Daniel Lund?n ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28533/files - new: https://git.openjdk.org/jdk/pull/28533/files/ef84ffa7..e92cd467 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From vlivanov at openjdk.org Wed Dec 3 19:39:56 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 3 Dec 2025 19:39:56 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v6] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 16:43:24 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with two additional commits since the last revision: > > - Test from Jorn > - Copyright years src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2036: > 2034: // from two writes (they must not be tearable) > 2035: private record Adaption(VarHandle vh, MethodHandle mh) {} > 2036: private @Stable Adaption adaption; Is a soft reference needed here? The situation looks similar to `MH.asTypeSoftCache`. It can keep some classes referred by `vh` alive for unnecessarily long. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2586374014 From kvn at openjdk.org Wed Dec 3 19:53:25 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Dec 2025 19:53:25 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:31:22 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - More comments > - Tighten up the comments > - Simplify third case: no need to loop, just restart the search > - Actually have a second "fast" case: receiver is not found in the table, and the table is full > - Pushing/popping for rare CAS path is counter-productive > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Tighten up some more > - Offset is always rscratch1, no need to save it > - Grossly simplify register shuffling > - ... and 11 more: https://git.openjdk.org/jdk/compare/7278d2e8...3c5019d9 This looks good. Thank you for cleaning up code and detailed comments. I submitted our testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3608573720 From vlivanov at openjdk.org Wed Dec 3 21:52:29 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 3 Dec 2025 21:52:29 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v3] In-Reply-To: References: Message-ID: > Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. > > There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. > > The difference can be illustrated with the following simple cases: > > class A { void m() {} } > class B extends A { void m() {} } > > void testInstanceOf(A obj) { > if (obj instanceof B) { > obj.m(); > } > } > > InstanceOf::testInstanceOf (12 bytes) > @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call > > vs > > void testInstanceOfCast(A obj) { > if (obj instanceof B) { > B b = (B)obj; > b.m(); > } > } > > InstanceOf::testInstanceOfCast (17 bytes) > @ 13 InstanceOf$B::m (1 bytes) inline (hot) > > > Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. > > FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. > > Testing: hs-tier1 - hs-tier5 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Unify Compile::should_delay_inlining ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28517/files - new: https://git.openjdk.org/jdk/pull/28517/files/0a5e78c6..c58c63cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=01-02 Stats: 12 lines in 4 files changed: 2 ins; 7 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28517.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28517/head:pull/28517 PR: https://git.openjdk.org/jdk/pull/28517 From vlivanov at openjdk.org Wed Dec 3 21:58:24 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 3 Dec 2025 21:58:24 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v4] In-Reply-To: References: Message-ID: > Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. > > There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. > > The difference can be illustrated with the following simple cases: > > class A { void m() {} } > class B extends A { void m() {} } > > void testInstanceOf(A obj) { > if (obj instanceof B) { > obj.m(); > } > } > > InstanceOf::testInstanceOf (12 bytes) > @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call > > vs > > void testInstanceOfCast(A obj) { > if (obj instanceof B) { > B b = (B)obj; > b.m(); > } > } > > InstanceOf::testInstanceOfCast (17 bytes) > @ 13 InstanceOf$B::m (1 bytes) inline (hot) > > > Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. > > FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. > > Testing: hs-tier1 - hs-tier5 Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into c2.instanceof - Unify Compile::should_delay_inlining - Test fix - bugid - C2: Materialize type information from instanceof checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28517/files - new: https://git.openjdk.org/jdk/pull/28517/files/c58c63cc..58a7d521 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=02-03 Stats: 98149 lines in 1639 files changed: 63706 ins; 23937 del; 10506 mod Patch: https://git.openjdk.org/jdk/pull/28517.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28517/head:pull/28517 PR: https://git.openjdk.org/jdk/pull/28517 From liach at openjdk.org Wed Dec 3 23:44:01 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Dec 2025 23:44:01 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v6] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 19:37:25 GMT, Vladimir Ivanov wrote: >> Chen Liang has updated the pull request incrementally with two additional commits since the last revision: >> >> - Test from Jorn >> - Copyright years > > src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2036: > >> 2034: // from two writes (they must not be tearable) >> 2035: private record Adaption(VarHandle vh, MethodHandle mh) {} >> 2036: private @Stable Adaption adaption; > > Is a soft reference needed here? The situation looks similar to `MH.asTypeSoftCache`. It can keep some classes referred by `vh` alive for unnecessarily long. I don't think we can use a SoftReference here if we need to achieve constant folding. Looking at inline_reference_get0, I think we might introduce another field property to trust a reference (potentially in an array) if both that reference and the referent within the reference is non-null. I think that belongs to a separate RFE. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2586946357 From sviswanathan at openjdk.org Thu Dec 4 00:15:18 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 4 Dec 2025 00:15:18 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v9] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Wed, 3 Dec 2025 18:30:50 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/vectorapi/TestABSMaskedMaxByteVector.java > > Co-authored-by: Daniel Lund?n Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28533#pullrequestreview-3537389980 From liach at openjdk.org Thu Dec 4 01:48:31 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 4 Dec 2025 01:48:31 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v7] In-Reply-To: References: Message-ID: <7ayMTZ4nXMyB1SXNRcYGjdxidNHDcAUNv_8fQZDUaPI=.a558d3a2-1d3e-4b45-8ba7-393c55a52785@github.com> > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/ff7b3629..8200fb28 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=05-06 Stats: 23 lines in 1 file changed: 20 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From xgong at openjdk.org Thu Dec 4 01:49:37 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 4 Dec 2025 01:49:37 GMT Subject: RFR: 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed Message-ID: **Problem:** This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { const TypeVect* vt = vect_type(); if (Matcher::vector_needs_partial_operations(this, vt)) { return VectorNode::try_to_gen_masked_vector(phase, this, vt); } return LoadNode::Ideal(phase, can_reshape); } The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. **Solution:** This patch addresses the issue through two changes: 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. **Testing:** - Verified on different SVE platforms with different vector sizes (128|256|512 bits). - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. ------------- Commit messages: - 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed Changes: https://git.openjdk.org/jdk/pull/28651/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28651&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371603 Stats: 619 lines in 8 files changed: 577 ins; 15 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/28651.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28651/head:pull/28651 PR: https://git.openjdk.org/jdk/pull/28651 From xgong at openjdk.org Thu Dec 4 01:49:37 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 4 Dec 2025 01:49:37 GMT Subject: RFR: 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed In-Reply-To: References: Message-ID: <6hTYNHBCAdNtrpIHUMQFQtGF3pgL_zEHllk3pa8VO5w=.633da968-84cf-4312-83ca-250941aaab5f@github.com> On Thu, 4 Dec 2025 01:41:19 GMT, Xiaohong Gong wrote: > **Problem:** > > This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: > > > Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { > const TypeVect* vt = vect_type(); > if (Matcher::vector_needs_partial_operations(this, vt)) { > return VectorNode::try_to_gen_masked_vector(phase, this, vt); > } > return LoadNode::Ideal(phase, can_reshape); > } > > > The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. > > This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Solution:** > > This patch addresses the issue through two changes: > > 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. > 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Testing:** > > - Verified on different SVE platforms with different vector sizes (128|256|512 bits). > - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). > - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. Hi @eme64 , this is the fixing for the crash issue reported on aws machine. Could you please help take a look? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28651#issuecomment-3609606962 From vlivanov at openjdk.org Thu Dec 4 01:58:56 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 4 Dec 2025 01:58:56 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v6] In-Reply-To: References: Message-ID: <8tu3HIArCw2cdoYR2SjI0b-TWYQxQLKkjQgucJEj8D4=.10946ec2-4958-48df-add4-b29d11c09448@github.com> On Wed, 3 Dec 2025 23:41:01 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2036: >> >>> 2034: // from two writes (they must not be tearable) >>> 2035: private record Adaption(VarHandle vh, MethodHandle mh) {} >>> 2036: private @Stable Adaption adaption; >> >> Is a soft reference needed here? The situation looks similar to `MH.asTypeSoftCache`. It can keep some classes referred by `vh` alive for unnecessarily long. > > I don't think we can use a SoftReference here if we need to achieve constant folding. > > Looking at inline_reference_get0, I think we might introduce another field property to trust a reference (potentially in an array) if both that reference and the referent within the reference is non-null. I think that belongs to a separate RFE. What do you think? Then it makes sense to limit the caching to safe cases only for now. Otherwise, it would functionally regress due to a possible memory leak. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2587156042 From dlong at openjdk.org Thu Dec 4 03:24:59 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 4 Dec 2025 03:24:59 GMT Subject: RFR: 8350208: CTW: GraphKit::add_safepoint_edges asserts "not enough operands for reexecution" In-Reply-To: References: Message-ID: <29opt6wmJqMaeTB-QbJWTMCndjSQH0ZhbTaD-ir6X4A=.b728d464-b7f1-44c5-a76c-c84f84f150f5@github.com> On Tue, 2 Dec 2025 10:30:46 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the issue of the compiler crashing with "not enough operands for reexecution". The issue here is that during `Parse::catch_inline_exceptions`, the old stack is gone, and we cannot reexecute the current bytecode anymore. However, there are some places where we try to insert safepoints into the graph, such as if the handler is a backward jump, or if one of the exceptions in the handlers is not loaded. Since the `_reexecute` state of the current jvms is "undefined", it is inferred automatically that it should reexecute for some bytecodes such as `putfield`. The solution then is to explicitly set `_reexecute` to false. > > I can manage to write a unit test for the case of a backward handler, for the other cases, since the exceptions that can be thrown for a bytecode that is inferred to reexecute are `NullPointerException`, `ArrayIndexOutOfBoundsException`, and `ArrayStoreException`. I find it hard to construct such a test in which one of them is not loaded. > > Please kindly review, thanks a lot. It seems to be very difficult to force the back-edge safepoint to deoptimize. I tried creating a thread that calls System.gc(), but so far no crash. Still, I think the state is incorrect if reexecute=false. Setting reexecute to false means it will skip the current instruction. To correctly handle a deoptimization on the backwards branch, the debug state, bci, and exception location should match. I think we have 3 choices to prepare for maybe_add_safepoint(): 1. preserve stack inputs, use original bci, do not push exception oop, let interpreter reexecute and throw the exception (reexecute=true) This might be as simple as reversing the order of calls to push_ex_oop and maybe_add_safepoint. 2. trim stack, push exception object, use bci of exception handler (reexecute=true) This would require temporarily changing the bci for the maybe_add_safepoint call. 3. trim stack, throw exception (move to Thread) (reexecute=true) This requires extra unconditional overhead even though safepoint rarely happens. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28597#issuecomment-3609883724 From duke at openjdk.org Thu Dec 4 05:04:32 2025 From: duke at openjdk.org (Harshit470250) Date: Thu, 4 Dec 2025 05:04:32 GMT Subject: RFR: 8370920: [s390] C2: add instruction size in s390.ad file [v9] In-Reply-To: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> References: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> Message-ID: > This pr adds the size of the match rule nodes. > > There were a lot of nodes for which the size was variable, for those node I have taken the maximum possible size. Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge remote-tracking branch 'origin/master' - remove whitespace - ... and 9 more: https://git.openjdk.org/jdk/compare/39a4f5df...05c649cb ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28054/files - new: https://git.openjdk.org/jdk/pull/28054/files/077d0258..05c649cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=07-08 Stats: 29485 lines in 697 files changed: 16799 ins; 8755 del; 3931 mod Patch: https://git.openjdk.org/jdk/pull/28054.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28054/head:pull/28054 PR: https://git.openjdk.org/jdk/pull/28054 From fyang at openjdk.org Thu Dec 4 05:30:01 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 4 Dec 2025 05:30:01 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v30] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 03:44:27 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: > > - Merge branch 'openjdk:master' into aes_ctr > - modify label L_EXIT to L_exit > - add more comments for key value 52 > - update some comments, names and Pseudocode > - modify stub_id name > - Merge branch 'openjdk:master' into aes_ctr > - modify format > - add more comments > - modify parm to unsigned as aarch64 and x86 > - clean comments and format > - ... and 21 more: https://git.openjdk.org/jdk/compare/530493fe...98d802d5 Still good me. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25281#pullrequestreview-3538186330 From haosun at openjdk.org Thu Dec 4 05:34:54 2025 From: haosun at openjdk.org (Hao Sun) Date: Thu, 4 Dec 2025 05:34:54 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 13:11:27 GMT, Ramkumar Sunderbabu wrote: > Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. > MD5 > SHA1 > SHA256 > SHA3 > > Testing: > All flag combinations from CI > hotspot tiers 1 to 5 > PS: only for tier testings, mac-aarch was skipped due to resource constraints Thanks for your work. I suppose the **os.arch** requires condition can be removed in the following cases: diff --git a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java index eeff351f737..3561be3b33b 100644 --- a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java +++ b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java @@ -25,10 +25,8 @@ * @test * @bug 8035968 * @summary Verify UseMD5Intrinsics option processing on supported CPU. - * ( Disable this test on riscv, because on riscv UseMD5Intrinsics depends on !AvoidUnalignedAccesses. ) * @library /test/lib / * @requires vm.flagless - * @requires os.arch != "riscv64" * * @build jdk.test.whitebox.WhiteBox * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox diff --git a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java index 1ce2c4b1f87..71ed3b3cac9 100644 --- a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java +++ b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java @@ -25,10 +25,8 @@ * @test * @bug 8035968 * @summary Verify UseSHA1Intrinsics option processing on supported CPU. - * ( Disable this test on riscv, because on riscv UseSHA1Intrinsics depends on !AvoidUnalignedAccesses. ) * @library /test/lib / * @requires vm.flagless - * @requires os.arch != "riscv64" * * @build jdk.test.whitebox.WhiteBox * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox diff --git a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java index d3c0a4a8da7..41a2ec277a2 100644 --- a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java +++ b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java @@ -28,10 +28,8 @@ * @summary Verify UseSHA3Intrinsics option processing on supported CPU. * @library /test/lib / * @requires vm.flagless - * @requires os.arch == "aarch64" & os.family == "mac" + * @requires os.arch == "aarch64" * @comment sha3 is only implemented on AArch64 for now. - * UseSHA3Intrinsics is only auto-enabled on Apple silicon, because it - * may introduce performance regression on others. See JDK-8297092. * * @build jdk.test.whitebox.WhiteBox * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox I checked on Nvidia Grace machine with the above patch. `TestUseSHA3IntrinsicsOptionOnSupportedCPU.java` can pass. If this patch is fine to you, we'd better run the tests on ricsv64 for safety. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3610316715 From fyang at openjdk.org Thu Dec 4 05:52:55 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 4 Dec 2025 05:52:55 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 05:32:31 GMT, Hao Sun wrote: >> Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. >> MD5 >> SHA1 >> SHA256 >> SHA3 >> >> Testing: >> All flag combinations from CI >> hotspot tiers 1 to 5 >> PS: only for tier testings, mac-aarch was skipped due to resource constraints > > Thanks for your work. > > I suppose the **os.arch** requires condition can be removed in the following cases: > > > diff --git a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java > index eeff351f737..3561be3b33b 100644 > --- a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java > +++ b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java > @@ -25,10 +25,8 @@ > * @test > * @bug 8035968 > * @summary Verify UseMD5Intrinsics option processing on supported CPU. > - * ( Disable this test on riscv, because on riscv UseMD5Intrinsics depends on !AvoidUnalignedAccesses. ) > * @library /test/lib / > * @requires vm.flagless > - * @requires os.arch != "riscv64" > * > * @build jdk.test.whitebox.WhiteBox > * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox > diff --git a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java > index 1ce2c4b1f87..71ed3b3cac9 100644 > --- a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java > +++ b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java > @@ -25,10 +25,8 @@ > * @test > * @bug 8035968 > * @summary Verify UseSHA1Intrinsics option processing on supported CPU. > - * ( Disable this test on riscv, because on riscv UseSHA1Intrinsics depends on !AvoidUnalignedAccesses. ) > * @library /test/lib / > * @requires vm.flagless > - * @requires os.arch != "riscv64" > * > * @build jdk.test.whitebox.WhiteBox > * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox > diff --git a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java > index d3c0a4a8da7..41a2ec277a2 100644 > --- a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java > +++ b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java > @@ -28,10 +28,8 @@ > * @summary Verify UseSHA3Intrinsics option processing on supported CPU. > * @library /test/lib / > * @requires vm.flagless > - * @requires os.arch == "aarch64" &... @shqking : I did a quick try on riscv64 and I see your add-on fix works as well. Good cleanup! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3610395107 From jbhateja at openjdk.org Thu Dec 4 07:18:59 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 4 Dec 2025 07:18:59 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v8] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Wed, 3 Dec 2025 06:30:06 GMT, Emanuel Peter wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Review comments resolutions > > Oh, a second review would be required though! @eme64 , seem due to some hickups PR is not marked ready for integration, kindly re-approve. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28533#issuecomment-3610638541 From epeter at openjdk.org Thu Dec 4 08:26:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Dec 2025 08:26:03 GMT Subject: RFR: 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 01:41:19 GMT, Xiaohong Gong wrote: > **Problem:** > > This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: > > > Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { > const TypeVect* vt = vect_type(); > if (Matcher::vector_needs_partial_operations(this, vt)) { > return VectorNode::try_to_gen_masked_vector(phase, this, vt); > } > return LoadNode::Ideal(phase, can_reshape); > } > > > The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. > > This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Solution:** > > This patch addresses the issue through two changes: > > 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. > 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Testing:** > > - Verified on different SVE platforms with different vector sizes (128|256|512 bits). > - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). > - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. @XiaohongGong Thanks for taking this over from me and fixing this so quickly, much appreciated! Thanks for adding my regression tests and for the attribution :) I only have a minor comment below. Otherwise, the code looks good to me. But since I'm not an SVE specialist, it would be good if someone with deeper knowledge would do a deeper review of the specific SVE parts. Once an SVE specialist gives the approval for the PR, I'l run some internal testing and approve from my side :) Ah, and one more thing: you should change the PR title to be more descriptive of the issue. The assert that was hit is only a far removed symptom. I would suggest: `C2 SVE: missing Ideal optimizations for load and store vectors` src/hotspot/share/opto/vectornode.cpp line 1118: > 1116: if (Matcher::vector_needs_partial_operations(this, vt)) { > 1117: return VectorNode::gen_masked_vector(phase, this, vt); > 1118: } I think it would still be good practice to expect that a `nullptr` could come from `gen_masked_vector`, and then continue with optimizations below, rather than just returning the `nullptr`. Because: who knows what someone in the future might do inside `gen_masked_vector`, maybe they'll find some edge case and just return `nullptr` again, and then we are back to similar issues. ------------- PR Review: https://git.openjdk.org/jdk/pull/28651#pullrequestreview-3538639579 PR Review Comment: https://git.openjdk.org/jdk/pull/28651#discussion_r2587987772 From epeter at openjdk.org Thu Dec 4 08:30:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Dec 2025 08:30:58 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v9] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Wed, 3 Dec 2025 18:30:50 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/vectorapi/TestABSMaskedMaxByteVector.java > > Co-authored-by: Daniel Lund?n Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28533#pullrequestreview-3538701546 From roland at openjdk.org Thu Dec 4 08:53:32 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 4 Dec 2025 08:53:32 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v8] In-Reply-To: References: Message-ID: > In test cases, `mh` is initially not constant so the method handle > invoke can't be inlined. It is later found to be constant, so it can > be turned into a direct call by > `Compile::process_late_inline_calls_no_inline()`. In the meantime, the > `CallNode` for the mh invoke is cloned (by loop switching). In the > process, only a shallow copy of the `JVMState` for the call is > made. The initial `CallNode` is the first to be processed by > `Compile::process_late_inline_calls_no_inline()` and that causes that > `CallNode` to become dead. The cloned `CallNode` is then > processed. The `JVMState` for that one references the initial > `CallNode` in its caller's `JVMState`. Because that node is dead, that > causes a crash. The fix I propose is to make a deep copy of the > `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is > assigned to the node. > > The other failure I see with these tests is: > > > # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 > # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! > > > because even though the `CallNode` is cloned, there's still only one > late inline recorded. The fix here is to increment > `_number_of_mh_late_inlines` when the node is cloned. > > This was reported by the netty developers. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/compile.hpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28088/files - new: https://git.openjdk.org/jdk/pull/28088/files/64b11e6e..124b1f69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28088.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28088/head:pull/28088 PR: https://git.openjdk.org/jdk/pull/28088 From xgong at openjdk.org Thu Dec 4 10:19:16 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 4 Dec 2025 10:19:16 GMT Subject: RFR: 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 08:08:14 GMT, Emanuel Peter wrote: >> **Problem:** >> >> This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: >> >> >> Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> const TypeVect* vt = vect_type(); >> if (Matcher::vector_needs_partial_operations(this, vt)) { >> return VectorNode::try_to_gen_masked_vector(phase, this, vt); >> } >> return LoadNode::Ideal(phase, can_reshape); >> } >> >> >> The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. >> >> This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Solution:** >> >> This patch addresses the issue through two changes: >> >> 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. >> 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Testing:** >> >> - Verified on different SVE platforms with different vector sizes (128|256|512 bits). >> - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). >> - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. > > src/hotspot/share/opto/vectornode.cpp line 1118: > >> 1116: if (Matcher::vector_needs_partial_operations(this, vt)) { >> 1117: return VectorNode::gen_masked_vector(phase, this, vt); >> 1118: } > > I think it would still be good practice to expect that a `nullptr` could come from `gen_masked_vector`, and then continue with optimizations below, rather than just returning the `nullptr`. > > Because: who knows what someone in the future might do inside `gen_masked_vector`, maybe they'll find some edge case and just return `nullptr` again, and then we are back to similar issues. I see your concern. Make sense to me. Thanks! I'd like keep current implementation of `Matcher::vector_needs_partial_operations` and `gen_masked_vector` because as we discussed in the previous PR that this sounds more reasonable. I will update the caller code here to check `nullptr` in addition although it won't generate a `nullptr` now. Code may look like: if (Matcher::vector_needs_partial_operations(this, vt)) { Node* n = VectorNode::gen_masked_vector(phase, this, vt); if (n != nullptr) { return n; } } return ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28651#discussion_r2588404849 From jbhateja at openjdk.org Thu Dec 4 10:20:26 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 4 Dec 2025 10:20:26 GMT Subject: Integrated: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Thu, 27 Nov 2025 12:56:08 GMT, Jatin Bhateja wrote: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 91c5bd55 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/91c5bd550a36e10e8b39d1b322fd433ee8df14f5 Stats: 59 lines in 2 files changed: 59 ins; 0 del; 0 mod 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 Reviewed-by: epeter, sviswanathan, dlunden ------------- PR: https://git.openjdk.org/jdk/pull/28533 From xgong at openjdk.org Thu Dec 4 10:23:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 4 Dec 2025 10:23:07 GMT Subject: RFR: 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 08:23:31 GMT, Emanuel Peter wrote: > Ah, and one more thing: you should change the PR title to be more descriptive of the issue. The assert that was hit is only a far removed symptom. I would suggest: > > `C2 SVE: missing Ideal optimizations for load and store vectors` The new title looks good to me. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28651#issuecomment-3611341132 From xgong at openjdk.org Thu Dec 4 10:28:00 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 4 Dec 2025 10:28:00 GMT Subject: RFR: 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 10:16:32 GMT, Xiaohong Gong wrote: >> src/hotspot/share/opto/vectornode.cpp line 1118: >> >>> 1116: if (Matcher::vector_needs_partial_operations(this, vt)) { >>> 1117: return VectorNode::gen_masked_vector(phase, this, vt); >>> 1118: } >> >> I think it would still be good practice to expect that a `nullptr` could come from `gen_masked_vector`, and then continue with optimizations below, rather than just returning the `nullptr`. >> >> Because: who knows what someone in the future might do inside `gen_masked_vector`, maybe they'll find some edge case and just return `nullptr` again, and then we are back to similar issues. > > I see your concern. Make sense to me. Thanks! I'd like keep current implementation of `Matcher::vector_needs_partial_operations` and `gen_masked_vector` because as we discussed in the previous PR that this sounds more reasonable. > > I will update the caller code here to check `nullptr` in addition although it won't generate a `nullptr` now. Code may look like: > > if (Matcher::vector_needs_partial_operations(this, vt)) { > Node* n = VectorNode::gen_masked_vector(phase, this, vt); > if (n != nullptr) { > return n; > } > } > return ... Is it better that we add an assertion of non `nullptr` value before returning in `gen_masked_vector` , consider this might make the caller code clean? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28651#discussion_r2588435959 From epeter at openjdk.org Thu Dec 4 12:13:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Dec 2025 12:13:00 GMT Subject: RFR: 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 10:25:03 GMT, Xiaohong Gong wrote: >> I see your concern. Make sense to me. Thanks! I'd like keep current implementation of `Matcher::vector_needs_partial_operations` and `gen_masked_vector` because as we discussed in the previous PR that this sounds more reasonable. >> >> I will update the caller code here to check `nullptr` in addition although it won't generate a `nullptr` now. Code may look like: >> >> if (Matcher::vector_needs_partial_operations(this, vt)) { >> Node* n = VectorNode::gen_masked_vector(phase, this, vt); >> if (n != nullptr) { >> return n; >> } >> } >> return ... > > Is it better that we add an assertion of non `nullptr` value before returning in `gen_masked_vector` , consider this might make the caller code clean? If you do the assert inside the method, then later someone may just do `return nullptr` somewhere, and your assert won't catch it, right? I would just do this: if (Matcher::vector_needs_partial_operations(this, vt)) { Node* n = VectorNode::gen_masked_vector(phase, this, vt); if (n != nullptr) { return n; } } Or you could even combine the methods `vector_needs_partial_operations` and `gen_masked_vector` into some `Ideal_partial_operations`: Node* progress = VectorNode::Ideal_partial_operations(phase, vt, this); if (progress != nullptr) { return progress; } That would remove the most clutter from the caller method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28651#discussion_r2588819337 From roland at openjdk.org Thu Dec 4 12:55:06 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 4 Dec 2025 12:55:06 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v6] In-Reply-To: <-kd-AfwkJebk8njImn0KeKvUCQnwoiqLr96cKCovlFc=.30649d16-8dee-4c9d-b1eb-ac9d7e9df86a@github.com> References: <7nY7QRkkFjOtOuBXID1I4GluA0vnFRLy_UnRICfVkR4=.99ec7fe1-af27-4ab7-ac63-27aa12bec4ef@github.com> <-kd-AfwkJebk8njImn0KeKvUCQnwoiqLr96cKCovlFc=.30649d16-8dee-4c9d-b1eb-ac9d7e9df86a@github.com> Message-ID: <8ZViC6KgwXNMreHupSs6CDUMYRhFOm0bZrkSqB4Jj0A=.ad3d56e8-4e0a-4e8d-b40c-2a8bd4627ca7@github.com> On Tue, 25 Nov 2025 22:45:53 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8370939 >> - review >> - Merge branch 'master' into JDK-8370939 >> - review >> - more >> - more >> - more >> - more >> - test >> - fix > > Sure, I'm fine either way. There are known cases when `dec_number_of_mh_late_inlines()` call is missing, so the patch as it is now looks fine as well considering we'll investigate the effects on `inline_string_calls()` call. @iwanowww @TobiHartmann thanks for the reviews and testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28088#issuecomment-3612111820 From roland at openjdk.org Thu Dec 4 13:04:59 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 4 Dec 2025 13:04:59 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v7] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 07:21:14 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8370939 >> - Merge branch 'master' into JDK-8370939 >> - review >> - Merge branch 'master' into JDK-8370939 >> - review >> - more >> - more >> - more >> - more >> - test >> - ... and 1 more: https://git.openjdk.org/jdk/compare/854b6c58...64b11e6e > > All testing passed. @TobiHartmann @iwanowww since I included Tobias' suggestion, I need one of you to approve the change again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28088#issuecomment-3612151058 From dbriemann at openjdk.org Thu Dec 4 13:06:09 2025 From: dbriemann at openjdk.org (David Briemann) Date: Thu, 4 Dec 2025 13:06:09 GMT Subject: RFR: 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled Message-ID: ?all and UseTransparentHugePages is enabled Aligning upwards instead of downwards not only solves the crash in large huge page scenarios but also ensures that the cache sizes are at least as big as they were set. ------------- Commit messages: - 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled Changes: https://git.openjdk.org/jdk/pull/28658/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28658&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372589 Stats: 10 lines in 1 file changed: 0 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28658.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28658/head:pull/28658 PR: https://git.openjdk.org/jdk/pull/28658 From mdoerr at openjdk.org Thu Dec 4 14:57:39 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 4 Dec 2025 14:57:39 GMT Subject: RFR: 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 12:59:03 GMT, David Briemann wrote: > Aligning upwards instead of downwards not only solves the crash in large huge page scenarios but also ensures that the cache sizes are at least as big as they were set. This make sense. Please make sure to commit it when JDK27 is started in head and after a 2nd review. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28658#pullrequestreview-3540432282 From thartmann at openjdk.org Thu Dec 4 15:07:40 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Dec 2025 15:07:40 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v8] In-Reply-To: References: Message-ID: <5KzZFCSXDZaqFIZEeirQNFrkDUDERHXCein3swQsSqc=.452f96e2-787b-48a2-a87a-4869f26f9075@github.com> On Thu, 4 Dec 2025 08:53:32 GMT, Roland Westrelin wrote: >> In test cases, `mh` is initially not constant so the method handle >> invoke can't be inlined. It is later found to be constant, so it can >> be turned into a direct call by >> `Compile::process_late_inline_calls_no_inline()`. In the meantime, the >> `CallNode` for the mh invoke is cloned (by loop switching). In the >> process, only a shallow copy of the `JVMState` for the call is >> made. The initial `CallNode` is the first to be processed by >> `Compile::process_late_inline_calls_no_inline()` and that causes that >> `CallNode` to become dead. The cloned `CallNode` is then >> processed. The `JVMState` for that one references the initial >> `CallNode` in its caller's `JVMState`. Because that node is dead, that >> causes a crash. The fix I propose is to make a deep copy of the >> `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is >> assigned to the node. >> >> The other failure I see with these tests is: >> >> >> # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 >> # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! >> >> >> because even though the `CallNode` is cloned, there's still only one >> late inline recorded. The fix here is to increment >> `_number_of_mh_late_inlines` when the node is cloned. >> >> This was reported by the netty developers. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/compile.hpp > > Co-authored-by: Tobias Hartmann Still good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28088#pullrequestreview-3540481333 From roland at openjdk.org Thu Dec 4 15:28:55 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 4 Dec 2025 15:28:55 GMT Subject: Integrated: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 16:39:07 GMT, Roland Westrelin wrote: > In test cases, `mh` is initially not constant so the method handle > invoke can't be inlined. It is later found to be constant, so it can > be turned into a direct call by > `Compile::process_late_inline_calls_no_inline()`. In the meantime, the > `CallNode` for the mh invoke is cloned (by loop switching). In the > process, only a shallow copy of the `JVMState` for the call is > made. The initial `CallNode` is the first to be processed by > `Compile::process_late_inline_calls_no_inline()` and that causes that > `CallNode` to become dead. The cloned `CallNode` is then > processed. The `JVMState` for that one references the initial > `CallNode` in its caller's `JVMState`. Because that node is dead, that > causes a crash. The fix I propose is to make a deep copy of the > `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is > assigned to the node. > > The other failure I see with these tests is: > > > # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 > # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! > > > because even though the `CallNode` is cloned, there's still only one > late inline recorded. The fix here is to increment > `_number_of_mh_late_inlines` when the node is cloned. > > This was reported by the netty developers. This pull request has now been integrated. Changeset: 27351401 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/2735140147b159d3a3238804f221db4f835ef744 Stats: 125 lines in 6 files changed: 113 ins; 3 del; 9 mod 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() Reviewed-by: thartmann, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/28088 From vlivanov at openjdk.org Thu Dec 4 19:17:29 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 4 Dec 2025 19:17:29 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:31:22 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - More comments > - Tighten up the comments > - Simplify third case: no need to loop, just restart the search > - Actually have a second "fast" case: receiver is not found in the table, and the table is full > - Pushing/popping for rare CAS path is counter-productive > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Tighten up some more > - Offset is always rscratch1, no need to save it > - Grossly simplify register shuffling > - ... and 11 more: https://git.openjdk.org/jdk/compare/7278d2e8...3c5019d9 Overall, looks good to me. Nice work, Aleksey! I'm curious how performance-sensitive that part of code is. Does it make sense to try to further optimize it? For example: - 2 slots is the most common case; any benefits from optimizing specifically for it (e.g., unroll the loops)? - fast path can be further optimized for no nulls case by offloading more work on found_null slow path [1] [1] // Fastest: receiver is already installed int i = 0; for (; i < receiver_count(); i++) { if (receiver(i) == recv) goto found_recv(i); if (receiver(i) == null) goto found_null(i); } goto polymorphic // Slow: try to install receiver found_null(i): // Finish the search for (int j = i ; j < receiver_count(); j++) { if (receiver(j) == recv) goto found_recv(j); } CAS(&receiver(i), null, recv); goto restart ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3613949570 From kvn at openjdk.org Thu Dec 4 21:49:06 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 4 Dec 2025 21:49:06 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: <463CgW4WDJnmLWha1DJLYIWw_UEh4ba9vdeQq80QfJM=.08d3125b-f5cd-4d29-8e0d-921448c792f2@github.com> On Thu, 4 Dec 2025 19:14:43 GMT, Vladimir Ivanov wrote: > 2 slots is the most common case; any benefits from optimizing specifically for it (e.g., unroll the loops)? Yes, since `row_limit()` is statically know and does not change we can have two versions of code based on its value: - `<= 2` slots: fully unrolled (much less instructions) - `> 2` slots: current proposed code ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3614447709 From kvn at openjdk.org Thu Dec 4 21:56:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 4 Dec 2025 21:56:46 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 19:14:43 GMT, Vladimir Ivanov wrote: > fast path can be further optimized for no nulls case by offloading more work on found_null slow path [1] I don't think we need to optimize `> 2` slots case. Such setting is not current default. Also based on @shipilev comments 2 separate loops is more or less optimal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3614475586 From duke at openjdk.org Fri Dec 5 02:42:25 2025 From: duke at openjdk.org (duke) Date: Fri, 5 Dec 2025 02:42:25 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v30] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 03:44:27 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: > > - Merge branch 'openjdk:master' into aes_ctr > - modify label L_EXIT to L_exit > - add more comments for key value 52 > - update some comments, names and Pseudocode > - modify stub_id name > - Merge branch 'openjdk:master' into aes_ctr > - modify format > - add more comments > - modify parm to unsigned as aarch64 and x86 > - clean comments and format > - ... and 21 more: https://git.openjdk.org/jdk/compare/530493fe...98d802d5 @Anjian-Wen Your change (at version 98d802d5da10c1fd9397bb539d9bf80a9fabd8f9) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25281#issuecomment-3615078771 From wenanjian at openjdk.org Fri Dec 5 02:54:10 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Fri, 5 Dec 2025 02:54:10 GMT Subject: Integrated: 8365732: RISC-V: implement AES CTR intrinsics In-Reply-To: References: Message-ID: On Sat, 17 May 2025 03:13:46 GMT, Anjian Wen wrote: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. This pull request has now been integrated. Changeset: 7e91d34f Author: Anjian Wen Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/7e91d34f3e83b4c39d6ce5de34373d7d74d54512 Stats: 239 lines in 2 files changed: 230 ins; 1 del; 8 mod 8365732: RISC-V: implement AES CTR intrinsics Reviewed-by: fyang, mli ------------- PR: https://git.openjdk.org/jdk/pull/25281 From wenanjian at openjdk.org Fri Dec 5 03:24:31 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Fri, 5 Dec 2025 03:24:31 GMT Subject: RFR: 8371968: RISC-V: implement AES CBC intrinsics [v3] In-Reply-To: References: Message-ID: > Support AES CBC intrinsic on RISCV, Already passed the tests in > test/hotspot/jtreg/compiler/codegen/aes/ > test/jdk/com/sun/crypto Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Modify some assert - RISC-V: implement AES CBC intrinsics ------------- Changes: https://git.openjdk.org/jdk/pull/28320/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28320&range=02 Stats: 228 lines in 1 file changed: 227 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28320/head:pull/28320 PR: https://git.openjdk.org/jdk/pull/28320 From jkarthikeyan at openjdk.org Fri Dec 5 06:04:03 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 5 Dec 2025 06:04:03 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v3] In-Reply-To: References: Message-ID: > Hi all, > This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Remove CompLevel.C2 from test - Merge branch 'master' into jdk-8365570 - Update comment for constraint casts - Fix truncation assert for constraint casts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26827/files - new: https://git.openjdk.org/jdk/pull/26827/files/d6c81a9d..f433930e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=01-02 Stats: 600645 lines in 6681 files changed: 411649 ins; 119944 del; 69052 mod Patch: https://git.openjdk.org/jdk/pull/26827.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26827/head:pull/26827 PR: https://git.openjdk.org/jdk/pull/26827 From jkarthikeyan at openjdk.org Fri Dec 5 06:04:04 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 5 Dec 2025 06:04:04 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: <1z430wmE_HRTJqmLIC15VMUktLyUEE7qjkppr1GniAI=.e560a4e9-59f0-4013-ad65-5d7261cdbf0e@github.com> References: <1z430wmE_HRTJqmLIC15VMUktLyUEE7qjkppr1GniAI=.e560a4e9-59f0-4013-ad65-5d7261cdbf0e@github.com> Message-ID: On Mon, 22 Sep 2025 06:57:24 GMT, Christian Hagedorn wrote: >> Thanks for the comment! I used `CompLevel.C2` here to simulate an -Xcomp environment, since unfortunately I couldn't replicate the crash without it with the IR framework. I'll do some investigation to find a way to ensure that it won't fail without C2. > > When you specify `@Warmup(0)`, the IR framework should directly compile it at the highest level which should be C2 if you are not running with a client build. So, I would have expected that it makes no difference. Can you double-check if you can reproduce it with `CompLevel.C2` but not without? After taking a closer look, I think you're correct- I can reproduce the crash using just `@Warmup(0)` and `@Test`. I think I used both while debugging and didn't test whether it worked without `CompLevel.C2`. I've removed it in the latest commit. However, I noticed that after that I merged from master neither the test nor the reproducer failed compilation before the fix is added. I think another commit must have changed the generated graph so that it no longer tries to vectorize the `CastII`, leading to the crash not being triggered. I looked at the JBS entry and saw that there wasn't another reproducer for this, so I was a bit unsure on what to do. Should this patch be merged with the current test? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2591510915 From kvn at openjdk.org Fri Dec 5 06:12:11 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Dec 2025 06:12:11 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: <1oFMd7KL-qhMxhHPi0mTspnW8oNMF8ZVGucT6IJXwv4=.d81f2eca-9ddc-4d63-8a20-40c2192e1004@github.com> On Tue, 2 Dec 2025 10:31:22 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - More comments > - Tighten up the comments > - Simplify third case: no need to loop, just restart the search > - Actually have a second "fast" case: receiver is not found in the table, and the table is full > - Pushing/popping for rare CAS path is counter-productive > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Tighten up some more > - Offset is always rscratch1, no need to save it > - Grossly simplify register shuffling > - ... and 11 more: https://git.openjdk.org/jdk/compare/7278d2e8...3c5019d9 My testing of version 07 passed clean ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3615451277 From duke at openjdk.org Fri Dec 5 07:25:06 2025 From: duke at openjdk.org (duke) Date: Fri, 5 Dec 2025 07:25:06 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v17] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 09:45:04 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Make code more compact @MaxXSoft Your change (at version 092d968d2fb54aaa59f9a28b907be5e0ddf3606c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3615611622 From xgong at openjdk.org Fri Dec 5 07:50:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 5 Dec 2025 07:50:07 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 12:09:51 GMT, Emanuel Peter wrote: >> Is it better that we add an assertion of non `nullptr` value before returning in `gen_masked_vector` , consider this might make the caller code clean? > > If you do the assert inside the method, then later someone may just do `return nullptr` somewhere, and your assert won't catch it, right? > > I would just do this: > > if (Matcher::vector_needs_partial_operations(this, vt)) { > Node* n = VectorNode::gen_masked_vector(phase, this, vt); > if (n != nullptr) { return n; } > } > > Or you could even combine the methods `vector_needs_partial_operations` and `gen_masked_vector` into some `Ideal_partial_operations`: > > Node* progress = VectorNode::Ideal_partial_operations(phase, vt, this); > if (progress != nullptr) { return progress; } > > That would remove the most clutter from the caller method. Sounds good to me. I will change the code by combining the methods into a function. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28651#discussion_r2591722969 From erfang at openjdk.org Fri Dec 5 08:13:19 2025 From: erfang at openjdk.org (Eric Fang) Date: Fri, 5 Dec 2025 08:13:19 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v3] In-Reply-To: References: Message-ID: > `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. > > This PR does the following optimizations: > 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. > 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as th... Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Refine the test code and comments - Merge branch 'master' into JDK-8370863-mask-cast-opt - Don't read and write the same memory in the JMH benchmarks - Merge branch 'master' into JDK-8370863-mask-cast-opt - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. Current optimizations related to `VectorMaskCastNode` include: 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. This PR does the following optimizations: 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as the vector length remains the same, and this is guranteed in the api level. I conducted some simple research on different mask generation methods and mask operations, and obtained the following table, which includes some potential optimization opportunities that may use this `uncast_mask` function. ``` mask_gen\op toLong anyTrue allTrue trueCount firstTrue lastTrue compare N/A N/A N/A N/A N/A N/A maskAll TBI TBI TBI TBI TBI TBI fromLong TBI TBI N/A TBI TBI TBI mask_gen\op and or xor andNot not laneIsSet compare N/A N/A N/A N/A TBI N/A maskAll TBI TBI TBI TBI TBI TBI fromLong N/A N/A N/A N/A TBI TBI ``` `TBI` indicated that there may be potential optimizations here that require further investigation. Benchmarks: On a Nvidia Grace machine with 128-bit SVE2: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 59.23 0.21 148.12 0.07 2.50 microMaskLoadCastStoreDouble128 ops/us 2.43 0.00 38.31 0.01 15.73 microMaskLoadCastStoreFloat128 ops/us 6.19 0.00 75.67 0.11 12.22 microMaskLoadCastStoreInt128 ops/us 6.19 0.00 75.67 0.03 12.22 microMaskLoadCastStoreLong128 ops/us 2.43 0.00 38.32 0.01 15.74 microMaskLoadCastStoreShort64 ops/us 28.89 0.02 75.60 0.09 2.62 ``` On a Nvidia Grace machine with 128-bit NEON: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 75.75 0.19 149.74 0.08 1.98 microMaskLoadCastStoreDouble128 ops/us 8.71 0.03 38.71 0.05 4.44 microMaskLoadCastStoreFloat128 ops/us 24.05 0.03 76.49 0.05 3.18 microMaskLoadCastStoreInt128 ops/us 24.06 0.02 76.51 0.05 3.18 microMaskLoadCastStoreLong128 ops/us 8.72 0.01 38.71 0.02 4.44 microMaskLoadCastStoreShort64 ops/us 24.64 0.01 76.43 0.06 3.10 ``` On an AMD EPYC 9124 16-Core Processor with AVX3: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 82.13 0.31 115.14 0.08 1.40 microMaskLoadCastStoreDouble128 ops/us 0.32 0.00 0.32 0.00 1.01 microMaskLoadCastStoreFloat128 ops/us 42.18 0.05 57.56 0.07 1.36 microMaskLoadCastStoreInt128 ops/us 42.19 0.01 57.53 0.08 1.36 microMaskLoadCastStoreLong128 ops/us 0.30 0.01 0.32 0.00 1.05 microMaskLoadCastStoreShort64 ops/us 42.18 0.05 57.59 0.01 1.37 ``` On an AMD EPYC 9124 16-Core Processor with AVX2: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 73.53 0.20 114.98 0.03 1.56 microMaskLoadCastStoreDouble128 ops/us 0.29 0.01 0.30 0.00 1.00 microMaskLoadCastStoreFloat128 ops/us 30.78 0.14 57.50 0.01 1.87 microMaskLoadCastStoreInt128 ops/us 30.65 0.26 57.50 0.01 1.88 microMaskLoadCastStoreLong128 ops/us 0.30 0.00 0.30 0.00 0.99 microMaskLoadCastStoreShort64 ops/us 24.92 0.00 57.49 0.01 2.31 ``` On an AMD EPYC 9124 16-Core Processor with AVX1: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 79.68 0.01 248.49 0.91 3.12 microMaskLoadCastStoreDouble128 ops/us 0.28 0.00 0.28 0.00 1.00 microMaskLoadCastStoreFloat128 ops/us 31.11 0.04 95.48 2.27 3.07 microMaskLoadCastStoreInt128 ops/us 31.10 0.03 99.94 1.87 3.21 microMaskLoadCastStoreLong128 ops/us 0.28 0.00 0.28 0.00 0.99 microMaskLoadCastStoreShort64 ops/us 31.11 0.02 94.97 2.30 3.05 ``` This PR was tested on 128-bit, 256-bit, and 512-bit (QEMU) aarch64 environments, and two 512-bit x64 machines with various configurations, including sve2, sve1, neon, avx3, avx2, avx1, sse4 and sse3, all tests passed. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28313/files - new: https://git.openjdk.org/jdk/pull/28313/files/3b0ff7d6..c04039ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=01-02 Stats: 64625 lines in 1066 files changed: 42561 ins; 15516 del; 6548 mod Patch: https://git.openjdk.org/jdk/pull/28313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28313/head:pull/28313 PR: https://git.openjdk.org/jdk/pull/28313 From erfang at openjdk.org Fri Dec 5 08:13:20 2025 From: erfang at openjdk.org (Eric Fang) Date: Fri, 5 Dec 2025 08:13:20 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v3] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 08:10:32 GMT, Eric Fang wrote: >> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. >> >> This PR does the following optimizations: >> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. >> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect... > > Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Refine the test code and comments > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Don't read and write the same memory in the JMH benchmarks > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns > > `VectorMaskCastNode` is used to cast a vector mask from one type to > another type. The cast may be generated by calling the vector API `cast` > or generated by the compiler. For example, some vector mask operations > like `trueCount` require the input mask to be integer types, so for > floating point type masks, the compiler will cast the mask to the > corresponding integer type mask automatically before doing the mask > operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` > don't generate code, otherwise code will be generated to extend or narrow > the mask. This IR node is not free no matter it generates code or not > because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` > The middle `VectorMaskCast` prevented the following optimization: > `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which > blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we > can safely do the optimization. But if the input value is changed, we > can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper > function, which can be used to uncast a chain of `VectorMaskCastNode`, > like the existing `Node::uncast(bool)` function. The funtion returns > the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may > contain one or more consecutive `VectorMaskCastNode` and this does not > affect the correctness of the optimization. Then this function can be > called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV... Thanks for your review! @galderz ------------- PR Review: https://git.openjdk.org/jdk/pull/28313#pullrequestreview-3537647873 From erfang at openjdk.org Fri Dec 5 08:13:24 2025 From: erfang at openjdk.org (Eric Fang) Date: Fri, 5 Dec 2025 08:13:24 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v2] In-Reply-To: References: <4vSKAtr0tUG0V193gIvnEFdHm18ZhqflVAwk-09IVQ0=.081806f5-6303-4b4f-975d-7c85427ccae5@github.com> Message-ID: On Fri, 28 Nov 2025 09:09:28 GMT, Galder Zamarre?o wrote: >> Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Don't read and write the same memory in the JMH benchmarks >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns >> >> `VectorMaskCastNode` is used to cast a vector mask from one type to >> another type. The cast may be generated by calling the vector API `cast` >> or generated by the compiler. For example, some vector mask operations >> like `trueCount` require the input mask to be integer types, so for >> floating point type masks, the compiler will cast the mask to the >> corresponding integer type mask automatically before doing the mask >> operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` >> don't generate code, otherwise code will be generated to extend or narrow >> the mask. This IR node is not free no matter it generates code or not >> because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` >> The middle `VectorMaskCast` prevented the following optimization: >> `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which >> blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we >> can safely do the optimization. But if the input value is changed, we >> can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper >> function, which can be used to uncast a chain of `VectorMaskCastNode`, >> like the existing `Node::uncast(bool)` function. The funtion returns >> the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may >> contain one or more consecutive `VectorMaskCastNode` and this does not >> affect the correctness of the optimization. Then this function can be >> called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMa... > > src/hotspot/share/opto/vectornode.cpp line 1056: > >> 1054: // x remains to be a bool vector with no changes. >> 1055: // This function can be used to eliminate the VectorMaskCast in such patterns. >> 1056: Node* VectorNode::uncast_mask(Node* n) { > > Could this be a static method instead? Yeah it's already a static method. See https://github.com/openjdk/jdk/pull/28313/files#diff-ba9e2d10a50a01316946660ec9f68321eb864fd9c815616c10abbec39360efe5R141 Or you mean a static method limited to this file ? If so, I prefer not, it may be used at other places. Thanks~ > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 57: > >> 55: applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}) >> 56: public static int testTwoCastToDifferentType() { >> 57: // The types before and after the two casts are not the same, so the cast cannot be eliminated. > > Outdated comment. Also please expand assertion comments Done, thanks! > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 79: > >> 77: applyIfCPUFeatureAnd = {"avx2", "true", "avx512", "false"}) >> 78: public static int testTwoCastToDifferentType2() { >> 79: // The types before and after the two casts are not the same, so the cast cannot be eliminated. > > Could you expand the documentation on the IR assertions? It's not immediately clear why with AVX-512 the cast remains but with AVX-2 it's removed. Also, this comment is outdated. This is because the following optimization on AVX2 affects this optimization: `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => x` On AVX2 `trueCount()` requires converting the mask to a **boolean vector** first via `VectorStoreMask`. So `VectorStoreMask` can apply the above optimization, which eliminates all `VectorMaskCast `nodes as a side effect. On AVX-512, masks use dedicated mask registers (k registers), `VectorStoreMask` is not generated for `trueCount()`, so `VectorMaskCast` nodes remain. I reorganised this file, please take another look, thanks~ > test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java line 240: > >> 238: >> 239: @Test >> 240: @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0", > > Could you add some assertion comments here as well to understand what causes the differences with different architectures? Done > test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java line 260: > >> 258: >> 259: @Test >> 260: @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0", > > Same here Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587209533 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587250313 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587250610 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587250972 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587251084 From epeter at openjdk.org Fri Dec 5 08:16:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Dec 2025 08:16:05 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v9] In-Reply-To: References: <9xCpJGY6CFKPAt4VtDY23_Tr3SE9tUebdMF3pAYWhFA=.281e0b84-bfad-466b-b290-918cf1fa83d1@github.com> Message-ID: On Wed, 29 Oct 2025 09:41:18 GMT, Qizheng Xing wrote: >> @MaxXSoft Feel free to just ping me again when you want another review :) >> FYI: I'll be on a longer vacation starting in about a week, so don't expect me to respond then. > > @eme64 Thank you for the review! > > @merykitty @jatin-bhateja Do you have any other suggestions regarding the latest changes in this patch? @MaxXSoft I think we should first merge and test this PR again. It is 4 weeks old now, so there is a risk that something would break if we integrated now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3615747403 From epeter at openjdk.org Fri Dec 5 08:25:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Dec 2025 08:25:57 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: <1z430wmE_HRTJqmLIC15VMUktLyUEE7qjkppr1GniAI=.e560a4e9-59f0-4013-ad65-5d7261cdbf0e@github.com> Message-ID: <8jnY6pqofieRIfV5fCqFxvHZMF3nAZbh7yAD7C_G5FU=.a12c98f6-c715-43f5-9528-62fcfdfc6e59@github.com> On Fri, 5 Dec 2025 06:00:22 GMT, Jasmine Karthikeyan wrote: >> When you specify `@Warmup(0)`, the IR framework should directly compile it at the highest level which should be C2 if you are not running with a client build. So, I would have expected that it makes no difference. Can you double-check if you can reproduce it with `CompLevel.C2` but not without? > > After taking a closer look, I think you're correct- I can reproduce the crash using just `@Warmup(0)` and `@Test`. I think I used both while debugging and didn't test whether it worked without `CompLevel.C2`. I've removed it in the latest commit. > However, I noticed that after that I merged from master neither the test nor the reproducer failed compilation before the fix is added. I think another commit must have changed the generated graph so that it no longer tries to vectorize the `CastII`, leading to the crash not being triggered. I looked at the JBS entry and saw that there wasn't another reproducer for this, so I was a bit unsure on what to do. Should this patch be merged with the current test? @jaskarth Thanks for looking into it! I would still add the fix, just in case. And I think the test as well, even if it does not reproduce any more. I was wondering: before the merge, when the test still reproduced: If you removed the `@Warmup(0)` and `CompLevel.C2`, and instead just do `framework.addFlags` with `-Xcomp`, would that reproduce too? If so, you could have a framework run with and one without Xcomp, the one with Xcomp also should have a compileonly. What do you think? Or we just push the patch as is, to be sure this is done and integrated. What do you think @chhagedorn ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2591811631 From qxing at openjdk.org Fri Dec 5 08:57:06 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 5 Dec 2025 08:57:06 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v18] In-Reply-To: References: Message-ID: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: - Merge branch 'master' into enhance-clz-type - Make code more compact - Fix include order - Merge branch 'master' into enhance-clz-type - Merge branch 'master' into enhance-clz-type - Fix constant fold - Remove redundant import - Add random range tests - Add more comments to IR test - Add more constant folding tests for CLZ/CTZ - ... and 14 more: https://git.openjdk.org/jdk/compare/674cc3ee...f0687754 ------------- Changes: https://git.openjdk.org/jdk/pull/25928/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=17 Stats: 801 lines in 4 files changed: 735 ins; 54 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/25928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928 PR: https://git.openjdk.org/jdk/pull/25928 From qxing at openjdk.org Fri Dec 5 09:05:03 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 5 Dec 2025 09:05:03 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v9] In-Reply-To: References: <9xCpJGY6CFKPAt4VtDY23_Tr3SE9tUebdMF3pAYWhFA=.281e0b84-bfad-466b-b290-918cf1fa83d1@github.com> Message-ID: On Fri, 5 Dec 2025 08:13:17 GMT, Emanuel Peter wrote: >> @eme64 Thank you for the review! >> >> @merykitty @jatin-bhateja Do you have any other suggestions regarding the latest changes in this patch? > > @MaxXSoft I think we should first merge and test this PR again. It is 4 weeks old now, so there is a risk that something would break if we integrated now. @eme64 Sorry for the delay in integrating this PR. Since it modifies HotSpot, I wasn't quite sure whether two or more reviewer approvals were required for the latest commit before integration, so I've been waiting for reviews from other reviewers. I've now merged the latest master branch, which may require you to run some Oracle tests again. Thanks. BTW, I'd like to confirm: if you approve the current post-merge changes, do I still need to wait for other reviewers? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3615898684 From epeter at openjdk.org Fri Dec 5 09:13:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Dec 2025 09:13:03 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise In-Reply-To: <1yjEG7xjZcmvAECD2ovS0pW8IwA30p9BzCr0Krgy4ks=.3224b13e-a32f-468d-a6e3-3fb5a1c35c04@github.com> References: <1yjEG7xjZcmvAECD2ovS0pW8IwA30p9BzCr0Krgy4ks=.3224b13e-a32f-468d-a6e3-3fb5a1c35c04@github.com> Message-ID: On Mon, 23 Jun 2025 07:41:01 GMT, Quan Anh Mai wrote: >>> A stricter bound would be `TypeInt::make(~t._bits._zeros, t._bits._ones, t._widen)` >> >> @merykitty Thanks for your review, did you mean `TypeInt::make(clz(~t._bits._zeros), clz(t._bits._ones), t._widen)`? > > @MaxXSoft Yes you are right, my mistake You do need 2 reviewers. I see that @merykitty has reviewed this a while ago, but a re-approval from him would be good since it is so long ago now. I'll run some testing now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3615927849 From erfang at openjdk.org Fri Dec 5 09:14:41 2025 From: erfang at openjdk.org (Eric Fang) Date: Fri, 5 Dec 2025 09:14:41 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v4] In-Reply-To: References: Message-ID: > `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. > > This PR does the following optimizations: > 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. > 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as th... Eric Fang has updated the pull request incrementally with one additional commit since the last revision: Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28313/files - new: https://git.openjdk.org/jdk/pull/28313/files/c04039ce..aa9a08a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=02-03 Stats: 18 lines in 1 file changed: 6 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/28313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28313/head:pull/28313 PR: https://git.openjdk.org/jdk/pull/28313 From xgong at openjdk.org Fri Dec 5 09:37:22 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 5 Dec 2025 09:37:22 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE [v2] In-Reply-To: References: Message-ID: > **Problem:** > > This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: > > > Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { > const TypeVect* vt = vect_type(); > if (Matcher::vector_needs_partial_operations(this, vt)) { > return VectorNode::try_to_gen_masked_vector(phase, this, vt); > } > return LoadNode::Ideal(phase, can_reshape); > } > > > The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. > > This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Solution:** > > This patch addresses the issue through two changes: > > 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. > 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Testing:** > > - Verified on different SVE platforms with different vector sizes (128|256|512 bits). > - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). > - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Combine the condition check and IR transformation to a method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28651/files - new: https://git.openjdk.org/jdk/pull/28651/files/ba7592cb..6206e8c0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28651&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28651&range=00-01 Stats: 36 lines in 2 files changed: 9 ins; 8 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/28651.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28651/head:pull/28651 PR: https://git.openjdk.org/jdk/pull/28651 From xgong at openjdk.org Fri Dec 5 09:44:56 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 5 Dec 2025 09:44:56 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE [v2] In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 08:23:31 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Combine the condition check and IR transformation to a method > > @XiaohongGong Thanks for taking this over from me and fixing this so quickly, much appreciated! > > Thanks for adding my regression tests and for the attribution :) > > I only have a minor comment below. Otherwise, the code looks good to me. But since I'm not an SVE specialist, it would be good if someone with deeper knowledge would do a deeper review of the specific SVE parts. > > Once an SVE specialist gives the approval for the PR, I'l run some internal testing and approve from my side :) > > Ah, and one more thing: you should change the PR title to be more descriptive of the issue. The assert that was hit is only a far removed symptom. I would suggest: > > `C2 SVE: missing Ideal optimizations for load and store vectors` Hi @eme64, I'v updated the patch based on your suggestion. Would you mind taking another look? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28651#issuecomment-3616043996 From xgong at openjdk.org Fri Dec 5 09:44:58 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 5 Dec 2025 09:44:58 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE [v2] In-Reply-To: References: Message-ID: <6ZWIQIawQ21qf2vHUGMeT2WENuzq5S0zbR0Mj3UehCU=.8a458bfa-6388-4690-aa97-d18e3a9303a5@github.com> On Fri, 5 Dec 2025 09:37:22 GMT, Xiaohong Gong wrote: >> **Problem:** >> >> This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: >> >> >> Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> const TypeVect* vt = vect_type(); >> if (Matcher::vector_needs_partial_operations(this, vt)) { >> return VectorNode::try_to_gen_masked_vector(phase, this, vt); >> } >> return LoadNode::Ideal(phase, can_reshape); >> } >> >> >> The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. >> >> This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Solution:** >> >> This patch addresses the issue through two changes: >> >> 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. >> 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Testing:** >> >> - Verified on different SVE platforms with different vector sizes (128|256|512 bits). >> - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). >> - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Combine the condition check and IR transformation to a method Hi @theRealAph , would you mind taking a look at this patch especially the AArch64 part? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28651#issuecomment-3616050133 From aseoane at openjdk.org Fri Dec 5 10:08:33 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Fri, 5 Dec 2025 10:08:33 GMT Subject: RFR: 8280283: Dead compiler code found during the JDK-8272058 code review [v2] In-Reply-To: References: Message-ID: > This PR removes some dead code that was found during review for [JDK-8272058](https://bugs.openjdk.org/browse/JDK-8272058). > > `target_addr_for_insn_or_null` is never run with a `ldrw` to `zr` (i.e. a safepoint poll). This is just a remnant from global safepointing, before we moved to using thread-local handshakes. No safepoint polling code reaches this function. More information can be read in the [original code review](https://github.com/openjdk/jdk18/pull/51#discussion_r774922087). Additionally, I have run tiers 1-6 to make sure this path did not exercise. > > This changeset also cleans up the unused `is_nop` function, following the comments in the issue. Other dead code mentioned there has since been long disappered. > > **Testing:** passes tiers 1-4 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Delete more unused code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28473/files - new: https://git.openjdk.org/jdk/pull/28473/files/749eda78..696cdd01 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28473&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28473&range=00-01 Stats: 10 lines in 2 files changed: 0 ins; 7 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28473.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28473/head:pull/28473 PR: https://git.openjdk.org/jdk/pull/28473 From aseoane at openjdk.org Fri Dec 5 10:08:34 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Fri, 5 Dec 2025 10:08:34 GMT Subject: RFR: 8280283: Dead compiler code found during the JDK-8272058 code review In-Reply-To: <2DyhWZxKPAXQbCsjHhoSUQZ80Em0931LE2LRjLNRdHA=.cc61d9bd-fc90-40ea-88e9-ac76c21b5756@github.com> References: <2DyhWZxKPAXQbCsjHhoSUQZ80Em0931LE2LRjLNRdHA=.cc61d9bd-fc90-40ea-88e9-ac76c21b5756@github.com> Message-ID: <6W3ic0sK8eq9-M4mqtG7IWMV4aMiVBn3bHcPFBstgko=.2eb68b79-c03d-4779-b9a9-93f9c5ddfd21@github.com> On Tue, 2 Dec 2025 14:59:50 GMT, Boris Ulasevich wrote: >> This PR removes some dead code that was found during review for [JDK-8272058](https://bugs.openjdk.org/browse/JDK-8272058). >> >> `target_addr_for_insn_or_null` is never run with a `ldrw` to `zr` (i.e. a safepoint poll). This is just a remnant from global safepointing, before we moved to using thread-local handshakes. No safepoint polling code reaches this function. More information can be read in the [original code review](https://github.com/openjdk/jdk18/pull/51#discussion_r774922087). Additionally, I have run tiers 1-6 to make sure this path did not exercise. >> >> This changeset also cleans up the unused `is_nop` function, following the comments in the issue. Other dead code mentioned there has since been long disappered. >> >> **Testing:** passes tiers 1-4 > > Nice cleanup. Cleaning up dead code always helps reduce technical debt. > Are you sure there isn?t more to clean up? Have you tried building with GCC?s -Wunused options to catch additional unused symbols? Thanks @bulasevich! I built the relevant files with `-Wunused` and a few more dead lines came up. I've addressed them with my last commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28473#issuecomment-3616158434 From bkilambi at openjdk.org Fri Dec 5 10:55:01 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 5 Dec 2025 10:55:01 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v5] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 11:34:11 GMT, Jatin Bhateja wrote: >> Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. >> - Add necessary inline expander support. >> - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. >> - Use existing Float16 vector IR and backend support. >> - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. >> >> The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). >> >> The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. >> >> image >> >> Initial RFP[1] was floated on the panama-dev mailing list. >> >> Kindly review the draft PR and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Cleanups src/hotspot/share/opto/vectorIntrinsics.cpp line 341: > 339: laneType == nullptr || !laneType->is_con() || > 340: vector_klass == nullptr || vector_klass->const_oop() == nullptr || > 341: laneType == nullptr || !laneType->is_con() || is this repeating the same condition on line 339? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2592254926 From jvernee at openjdk.org Fri Dec 5 11:10:27 2025 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 5 Dec 2025 11:10:27 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v7] In-Reply-To: <7ayMTZ4nXMyB1SXNRcYGjdxidNHDcAUNv_8fQZDUaPI=.a558d3a2-1d3e-4b45-8ba7-393c55a52785@github.com> References: <7ayMTZ4nXMyB1SXNRcYGjdxidNHDcAUNv_8fQZDUaPI=.a558d3a2-1d3e-4b45-8ba7-393c55a52785@github.com> Message-ID: On Thu, 4 Dec 2025 01:48:31 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure Latest version looks much better to me (as mentioned offline). What was the issue with the failing test around the removal of the _V guard template? Also, looks like the new IR test is failing in GHA test/hotspot/jtreg/compiler/c2/irTests/constantFold/VarHandleMismatchedTypeFold.java line 48: > 46: public static void main(String[] args) { > 47: TestFramework.runWithFlags( > 48: "-XX:+UnlockExperimentalVMOptions" Why is this flag needed? ------------- PR Review: https://git.openjdk.org/jdk/pull/28585#pullrequestreview-3544230655 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2592287594 From roland at openjdk.org Fri Dec 5 11:55:34 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 11:55:34 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis Message-ID: The crash occurs because verification code expects the inner and outer loop of a loop strip mining nest to have the same number of phis but, in this case, the inner loop has one more memory phis than the outer loop. 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and outer loops have the same number of phis, as expected. 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 through the outer loop phi: 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 93 Phi === 249 445 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 3) `PhiNode::Identity` runs for 430 and finds that it can be replace by 429: the non bottom memory phi 430 can be replaced by the bottom memory 429 that has the same inputs. 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 446 444 ]] 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,[429],93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 444 Phi === 248 306 121 [[ 445 94 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 93 Phi === 249 445 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 94 Phi === 249 444 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 4) `PhiNode::Ideal` runs for 93 and pushed the `MergeMem` through that `Phi`: 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 446 444 ]] 446 Phi === 248 284 170 [[ 453 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,[429],[93] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 444 Phi === 248 306 121 [[ 451 94 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],[93] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 249 CountedLoop === 249 248 197 [[ 249 119 96 453 94 451 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 453 Phi === 249 446 170 [[ 452 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=451,[93] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 94 Phi === 249 444 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 451 Phi === 249 444 121 [[ 452 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[93] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) Now, `PhiNode::Identity` for 94 could replace it with the bottom memory phi with same inputs 451. But it doesn't run. It last ran between 3) and 4) and there's no reason for igvn to execute it again because 4) doesn't cause 94 to change in any way. The fix I propose is to mirror the transformation from `PhiNode::Identity` in `PhiNode::Ideal` so the end result doesn't depend on what phi is modified and processed by igvn last. ------------- Commit messages: - more - test - more - fix Changes: https://git.openjdk.org/jdk/pull/28677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28677&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370200 Stats: 124 lines in 5 files changed: 102 ins; 16 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/28677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28677/head:pull/28677 PR: https://git.openjdk.org/jdk/pull/28677 From galder at openjdk.org Fri Dec 5 13:38:24 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 5 Dec 2025 13:38:24 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v10] In-Reply-To: <1C0ByMoDpDlOmbDQVgBTQg7yKI0UaLtX92Xmf0bta4E=.0c060c5a-d60d-4cbe-84c5-03884116ef34@github.com> References: <1C0ByMoDpDlOmbDQVgBTQg7yKI0UaLtX92Xmf0bta4E=.0c060c5a-d60d-4cbe-84c5-03884116ef34@github.com> Message-ID: On Thu, 12 Jun 2025 09:12:03 GMT, Emanuel Peter wrote: >> **Past Work** >> With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. >> >> **This PR** >> I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. >> >> I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. >> >> My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. >> >> **Future Work:** >> In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. >> >> I filed: >> [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) >> (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) >> >> Testing passed tier1-3, with extra timeout factor 20. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 79 additional commits since the last revision: > > - Merge branch 'master' into JDK-8347273-verify-IGVN-Ideal-Identity > - update comments for Christian > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - reorder flags for Christian > - max_modes > - use stringStream instead of ttyLocker > - assert(false) for Christian > - rename for Christian > - Update src/hotspot/share/opto/phaseX.cpp > > Co-authored-by: Manuel H?ssig > - review suggestions, and handled a few more edge cases > - ... and 69 more: https://git.openjdk.org/jdk/compare/44e9f72c...d9546d87 src/hotspot/share/opto/phaseX.cpp line 1966: > 1964: // > 1965: // Found with: > 1966: // compiler/codegen/TestBooleanVect.java @eme64 Did you really encounter issues for this min/max codes with `TestBooleanVect`? Or is test name incorrect here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2592718199 From roland at openjdk.org Fri Dec 5 13:48:50 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 13:48:50 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v5] In-Reply-To: References: Message-ID: > The test case has an out of loop `Store` with an `AddP` address > expression that has other uses and is in the loop body. Schematically, > only showing the address subgraph and the bases for the `AddP`s: > > > Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 > -> CastPP#110 > > > Both `AddP`s have the same base, a `CastPP` that's also in the loop > body. > > That loop is a counted loop and only has 3 iterations so is fully > unrolled. First, one iteration is peeled: > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > The `AddP`s and `CastPP` are cloned (because in the loop body). As > part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is > called. It finds the test that guards `CastPP#283` in the peeled > iteration dominates and replaces the test that guards `CastPP#110` > (the test in the peeled iteration is the clone of the test in the > loop). That causes `CastPP#110`'s control to be updated to that of the > test in the peeled iteration and to be yanked from the loop. So now > `CastPP#283` and `CastPP#110` have the same inputs. > > Next unrolling happens: > > > /-> CastPP#110 > /-> AddP#400 -> AddP#401 -> CastPP#110 > Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 > \ -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > `AddP`s are cloned once more but not the `CastPP`s because they are > both in the peeled iteration now. A new `Phi` is added. > > Next igvn runs. It's going to push the `AddP`s through the `Phi`s. > > Through `Phi#477`: > > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 > \ -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > Through `Phi#360`: > > > /-> AddP#134 -> CastPP#110 > /-> Phi#509 -> AddP#401 -> CastPP#110 > Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 > -> Phi#514 -> CastPP#283 > -> CastP#110 > > > Then `Phi#514` which has 2 `CastPP`s as input with identical inputs is > transformed into anot... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25386/files - new: https://git.openjdk.org/jdk/pull/25386/files/15c17bb1..20154a12 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=03-04 Stats: 34 lines in 3 files changed: 17 ins; 6 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/25386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25386/head:pull/25386 PR: https://git.openjdk.org/jdk/pull/25386 From roland at openjdk.org Fri Dec 5 13:48:51 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 13:48:51 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v4] In-Reply-To: <9mnRXpB16Y6Mw0TSGFJz-69m24lzCNPMC_B1_YseD4M=.be94bbba-88ce-4958-a8bd-89862d7ec2e7@github.com> References: <9mnRXpB16Y6Mw0TSGFJz-69m24lzCNPMC_B1_YseD4M=.be94bbba-88ce-4958-a8bd-89862d7ec2e7@github.com> Message-ID: On Wed, 3 Dec 2025 05:43:30 GMT, Emanuel Peter wrote: > I think I'm on board with the solution now. It is probably best to do it during IGVN. I have a few more suggestions below :) Updated change should address your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3616982320 From roland at openjdk.org Fri Dec 5 13:52:12 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 13:52:12 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v9] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/castnode.hpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24575/files - new: https://git.openjdk.org/jdk/pull/24575/files/93b8b0c5..cab44429 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=07-08 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From galder at openjdk.org Fri Dec 5 14:02:06 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 5 Dec 2025 14:02:06 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v10] In-Reply-To: References: <1C0ByMoDpDlOmbDQVgBTQg7yKI0UaLtX92Xmf0bta4E=.0c060c5a-d60d-4cbe-84c5-03884116ef34@github.com> Message-ID: On Fri, 5 Dec 2025 13:35:22 GMT, Galder Zamarre?o wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 79 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8347273-verify-IGVN-Ideal-Identity >> - update comments for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - reorder flags for Christian >> - max_modes >> - use stringStream instead of ttyLocker >> - assert(false) for Christian >> - rename for Christian >> - Update src/hotspot/share/opto/phaseX.cpp >> >> Co-authored-by: Manuel H?ssig >> - review suggestions, and handled a few more edge cases >> - ... and 69 more: https://git.openjdk.org/jdk/compare/968ce906...d9546d87 > > src/hotspot/share/opto/phaseX.cpp line 1966: > >> 1964: // >> 1965: // Found with: >> 1966: // compiler/codegen/TestBooleanVect.java > > @eme64 Did you really encounter issues for this min/max codes with `TestBooleanVect`? Or is test name incorrect here? Seems correct. I removed all the cases and indeed `TestBooleanVect` fails. All good :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2592791745 From roland at openjdk.org Fri Dec 5 14:05:06 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:05:06 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24575/files - new: https://git.openjdk.org/jdk/pull/24575/files/cab44429..4a877c43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=08-09 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From roland at openjdk.org Fri Dec 5 14:05:09 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:05:09 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <5DHx3WmMb1UtSeyiEiYCiisVgRFggPFfxBggpgtuD6M=.d72a9c07-9624-47ea-9398-a0d1dee69755@github.com> On Tue, 2 Dec 2025 17:32:09 GMT, Quan Anh Mai wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'master' into JDK-8354282 >> - whitespace >> - review >> - review >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java >> >> Co-authored-by: Christian Hagedorn >> - review >> - review >> - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 > > src/hotspot/share/opto/castnode.hpp line 105: > >> 103: // All the possible combinations of floating/narrowing with example use cases: >> 104: >> 105: // Use case example: Range Check CastII > > I believe this is incorrect, a range check should be floating non-narrowing. It is only narrowing if the length of the array is a constant. It is because this cast encodes the dependency on the condition `index u< length`. This condition cannot be expressed in terms of `Type` unless `length` is a constant. Range check `CastII` were added to protect the `ConvI2L` in the address expression on 64 bits. The problem there was, in some cases, that the `ConvI2L` would float above the range check (because `ConvI2L` has no control input) and could end up with an out of range input (which in turn would cause the `ConvI2L` to become `top` in places where it wasn't expected). So `CastII` doesn't carry the control dependency of an array access on its range check. That dependency is carried by the `MemNode` which has its control input set to the range check. What you're saying, if I understand it correctly, would be true if the `CastII` was required to prevent an array `Load` from floating. But that's not the case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2592801401 From roland at openjdk.org Fri Dec 5 14:05:10 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:05:10 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> Message-ID: On Tue, 2 Dec 2025 17:41:37 GMT, Quan Anh Mai wrote: >> Ok, I now read the PR from the top, and not just recent changes. If one were to start reading from the top, it would be clear without my suggestions here. But I think it could still be good to apply something about letting the Cast float to where we would hoist the RC. > > Naming is hard, but it is worth pointing out in the comment that floating here refers to `depends_only_on_test`. In other words, a cast is considered floating if it is legal to change the control input of a cast from an `IfTrue` or `IfFalse` to an `IfTrue` and `IfFalse` that dominates the current control input, and the corresponding conditions of the `If`s are the same. In contrast, we cannot do that for a pinned cast, and if the control is folded away, the control input of the pinned cast is changed to the control predecessor of the folded node. > > It is also worth noting that we have `Node::pinned` which means the node is pinned AT the control input while pinned here means that it is pinned UNDER the control input. Very confusing! I added a mention of `depends_only_on_test`. Is that good enough? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2592784214 From epeter at openjdk.org Fri Dec 5 14:28:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Dec 2025 14:28:38 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v5] In-Reply-To: References: Message-ID: <2o1eUSi-ngISF33nhD0ie40H3PUeyT0rrV1DYjd7Ud4=.b05b3657-c32d-477a-8834-8747a8e98ed0@github.com> On Fri, 5 Dec 2025 13:48:50 GMT, Roland Westrelin wrote: >> The test case has an out of loop `Store` with an `AddP` address >> expression that has other uses and is in the loop body. Schematically, >> only showing the address subgraph and the bases for the `AddP`s: >> >> >> Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> CastPP#110 >> >> >> Both `AddP`s have the same base, a `CastPP` that's also in the loop >> body. >> >> That loop is a counted loop and only has 3 iterations so is fully >> unrolled. First, one iteration is peeled: >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> The `AddP`s and `CastPP` are cloned (because in the loop body). As >> part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is >> called. It finds the test that guards `CastPP#283` in the peeled >> iteration dominates and replaces the test that guards `CastPP#110` >> (the test in the peeled iteration is the clone of the test in the >> loop). That causes `CastPP#110`'s control to be updated to that of the >> test in the peeled iteration and to be yanked from the loop. So now >> `CastPP#283` and `CastPP#110` have the same inputs. >> >> Next unrolling happens: >> >> >> /-> CastPP#110 >> /-> AddP#400 -> AddP#401 -> CastPP#110 >> Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 >> \ -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> `AddP`s are cloned once more but not the `CastPP`s because they are >> both in the peeled iteration now. A new `Phi` is added. >> >> Next igvn runs. It's going to push the `AddP`s through the `Phi`s. >> >> Through `Phi#477`: >> >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 >> \ -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> Through `Phi#360`: >> >> >> /-> AddP#134 -> CastPP#110 >> /-> Phi#509 -> AddP#401 -> CastPP#110 >> Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 >> -> Phi#514 -> CastPP#283 >> ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Excellent, this looks great! Thanks for the updates @rwestrel ! I did not run testing again now. I think we can do that when a second reviewer gives the approval :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25386#pullrequestreview-3544999922 From epeter at openjdk.org Fri Dec 5 14:28:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Dec 2025 14:28:44 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v2] In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 04:03:58 GMT, Dean Long wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - test seed >> - more >> - Merge branch 'master' into JDK-8351889 >> - Merge branch 'master' into JDK-8351889 >> - more >> - test >> - fix > > What if we just relax the assert? I failed to figure out what this assert is protecting us from by looking at the code. So what happens in a product build or when this assert is commented out? @dean-long @galderz Do you want to do the second review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3617140509 From epeter at openjdk.org Fri Dec 5 14:32:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Dec 2025 14:32:43 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v18] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 08:57:06 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - Merge branch 'master' into enhance-clz-type > - Make code more compact > - Fix include order > - Merge branch 'master' into enhance-clz-type > - Merge branch 'master' into enhance-clz-type > - Fix constant fold > - Remove redundant import > - Add random range tests > - Add more comments to IR test > - Add more constant folding tests for CLZ/CTZ > - ... and 14 more: https://git.openjdk.org/jdk/compare/674cc3ee...f0687754 Testing passed. @merykitty @jatin-bhateja Do either of you want to give a second approval? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25928#pullrequestreview-3545020267 From roland at openjdk.org Fri Dec 5 14:52:51 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:52:51 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v13] In-Reply-To: References: <2oDqUvcW_3hJRPRri4uttpkgfeCovL4ZZkcI0R1bB1A=.173b3a58-d0f1-4b29-94d1-77b0a350c790@github.com> <2wAnS7drj_r3dqsy5CEF9vBG40KizHsQDOxMeNymwhw=.9bc29879-eead-401c-b750-814592feff63@github.com> <-1wiWF_UEvCO6xPuYvIsElBzPPQDejGahm9Xd5YszPU=.cfb41cb1-f681-4e75-8c29-2d928468f53b@github.com> Message-ID: <42lOFbyCuQt4xj-pK-ME6ScceXqTnGOY0HrWnJMK56k=.87b29936-511f-4ba4-a429-e8b9faed83a2@github.com> On Sun, 30 Nov 2025 08:03:32 GMT, Zihao Lin wrote: >> I had a closer look and I think you ran into an inconsistency. Let me see if I can get it fixed as a separate change. > > Sure, it's better to separate to another change. I am not familiar this part, please pin me if you have better solution. Thanks! I filed https://bugs.openjdk.org/browse/JDK-8373143 for this but I keep finding new issues. So this one will take some time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2592955645 From qamai at openjdk.org Fri Dec 5 15:05:53 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 Dec 2025 15:05:53 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v18] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 08:57:06 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - Merge branch 'master' into enhance-clz-type > - Make code more compact > - Fix include order > - Merge branch 'master' into enhance-clz-type > - Merge branch 'master' into enhance-clz-type > - Fix constant fold > - Remove redundant import > - Add random range tests > - Add more comments to IR test > - Add more constant folding tests for CLZ/CTZ > - ... and 14 more: https://git.openjdk.org/jdk/compare/674cc3ee...f0687754 LGTM ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/25928#pullrequestreview-3545147861 From qamai at openjdk.org Fri Dec 5 15:05:55 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 Dec 2025 15:05:55 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: References: <2vGPKe7ESZqYjemMvDjFxb4QTk3VjybE0lk59Vqj1Ts=.e6a555a5-407b-4389-8db5-aa02a7de9960@github.com> Message-ID: On Tue, 19 Aug 2025 13:43:37 GMT, Emanuel Peter wrote: >>> Can you explain why you need this? Why is `count_trailing_zeros` and `count_leading_zeros` not enough, when you cast at the use-site? >> >> @eme64 The explanation of @merykitty is right, the implementation of `count_leading_zeros` and `count_trailing_zeros` reject zero as the input. >> >> Perhaps we could open another PR to add zero support for these functions, since it's less relevant to this node type change and might require other changes to the code that calls them. > > In `src/hotspot/share/utilities/count_leading_zeros.hpp`, it says that 0 behavior is undefined. Ok... but why do we do that? Is that a performance optimization ? If yes, is it really worth it? If there is no good reason not to handle 0, we should just handle it. > > We have some tests in `test/hotspot/gtest/utilities/test_count_leading_zeros.cpp`. > > It would be interesting to quickly check if any use of these methods could ever encounter zero, and then hit the assert. I would not be surprised if we found a bug here. > > I think this would be a worth while cleanup task. I would prefer if we clean things up now, and don't just let more special handling code get integrated. @eme64 It is because the intrinsics we use give unspecified results for 0, so it just propagated upward. I think it is definitely preferable to fix this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2592995218 From chagedorn at openjdk.org Fri Dec 5 16:29:35 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Dec 2025 16:29:35 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v23] In-Reply-To: <8kNvPKU3I3PdOKtInEoHzV-i8T6-IETIBup-bxcr7_c=.91cc1d46-6d49-4fb7-9302-55597b7ae428@github.com> References: <6Yo2VYqBk_iaUpAGdPvyCjOyn_XW2nVPN5_w8XbXvkU=.91138210-54e3-4c28-b1d8-eb706583348e@github.com> <8kNvPKU3I3PdOKtInEoHzV-i8T6-IETIBup-bxcr7_c=.91cc1d46-6d49-4fb7-9302-55597b7ae428@github.com> Message-ID: <3vcLDogQ7FM6ga5oz_UchKRew9uy9WqknrFfJgJHxw0=.73b17591-fa84-4152-a3d2-b1685dba0fdf@github.com> On Thu, 27 Nov 2025 21:06:18 GMT, Kangcheng Xu wrote: >> Was too busy this week, will try to come back to this next week! > > @chhagedorn Thank you reviewing. I'm glad to hear I'm making progress. Please see [my pervious comment](https://github.com/openjdk/jdk/pull/24458#discussion_r2569790528) regarding iteratively uncasting `xphi()`. > >> [...] give your patch a spin in our standard testing [...] > > Yes please. I've addressed last few suggestions and merged in the master. > >> [...] run some more extended testing with your old vs. new counted loop transformation state [...] > > Good idea. I've updated the old vs. new code based on the latest patch on this pr. Please find it on the [`counted-loop-refactor-old-vs-new` branch](https://github.com/tabjy/jdk/commits/counted-loop-refactor-old-vs-new/). > > Please let me know how the testing goes. Thank you very much once again! Thanks @tabjy for the update. Was too busy this week with the mainline fork but I'm happy to take another look next week :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3617590462 From rsunderbabu at openjdk.org Fri Dec 5 17:11:13 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Fri, 5 Dec 2025 17:11:13 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v2] In-Reply-To: References: Message-ID: > Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. > MD5 > SHA1 > SHA256 > SHA3 > > Testing: > All flag combinations from CI > hotspot tiers 1 to 5 > PS: only for tier testings, mac-aarch was skipped due to resource constraints Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: remove requires condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28634/files - new: https://git.openjdk.org/jdk/pull/28634/files/e5d1497c..654604b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28634&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28634&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28634/head:pull/28634 PR: https://git.openjdk.org/jdk/pull/28634 From rsunderbabu at openjdk.org Fri Dec 5 17:15:43 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Fri, 5 Dec 2025 17:15:43 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v2] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 17:11:13 GMT, Ramkumar Sunderbabu wrote: >> Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. >> MD5 >> SHA1 >> SHA256 >> SHA3 >> >> Testing: >> All flag combinations from CI >> hotspot tiers 1 to 5 >> PS: only for tier testings, mac-aarch was skipped due to resource constraints > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > remove requires condition I didn't modify TestUseSHA3IntrinsicsOptionOnSupportedCPU.java since it was failing once I removed the condition completely. `@requires os.arch == "aarch64" & os.family == "mac"` `----------System.err:(32/2790)---------- stdout: []; stderr: [Java HotSpot(TM) 64-Bit Server VM warning: Option NeverActAsServerClassMachine was deprecated in version 26.0 and will likely be removed in a future release. java version "26-ea" 2026-03-17 Java(TM) SE Runtime Environment (build 26-ea+26-2610) Java HotSpot(TM) 64-Bit Server VM (build 26-ea+26-2610, mixed mode, sharing) ] exitValue = 0 java.lang.AssertionError: Expected message not found: 'Intrinsics for SHA3-224, SHA3-256, SHA3-384 and SHA3-512 crypto hash functions not available on this CPU.'. Enabling option 'UseSHA3Intrinsics' should not be possible and should result in a warning if -XX:-UseSHA was passed to JVM at jdk.test.lib.cli.CommandLineOptionTest.verifyOutput(CommandLineOptionTest.java:159) at jdk.test.lib.cli.CommandLineOptionTest.verifyJVMStartup(CommandLineOptionTest.java:130) at jdk.test.lib.cli.CommandLineOptionTest.verifySameJVMStartup(CommandLineOptionTest.java:211) at compiler.intrinsics.sha.cli.testcases.GenericTestCaseForSupportedCPU.verifyWarnings(GenericTestCaseForSupportedCPU.java:82) at compiler.intrinsics.sha.cli.DigestOptionsBase$TestCase.test(DigestOptionsBase.java:162) at compiler.intrinsics.sha.cli.DigestOptionsBase.runTestCases(DigestOptionsBase.java:139) at jdk.test.lib.cli.CommandLineOptionTest.test(CommandLineOptionTest.java:544) at compiler.intrinsics.sha.cli.TestUseSHA3IntrinsicsOptionOnSupportedCPU.main(TestUseSHA3IntrinsicsOptionOnSupportedCPU.java:47) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) at java.base/java.lang.Thread.run(Thread.java:1516) Caused by: java.lang.RuntimeException: 'Intrinsics for SHA3-224, SHA3-256, SHA3-384 and SHA3-512 crypto hash functions not available on this CPU.' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:407) at jdk.test.lib.cli.CommandLineOptionTest.verifyOutput(CommandLineOptionTest.java:154) ... 11 more JavaTest Message: Test threw exception: java.lang.AssertionError: Expected message not found: 'Intrinsics for SHA3-224, SHA3-256, SHA3-384 and SHA3-512 crypto hash functions not available on this CPU.'. Enabling option 'UseSHA3Intrinsics' should not be possible and should result in a warning if -XX:-UseSHA was passed to JVM JavaTest Message: shutting down test STATUS:Failed.`main' threw exception: java.lang.AssertionError: Expected message not found: 'Intrinsics for SHA3-224, SHA3-256, SHA3-384 and SHA3-512 crypto hash functions not available on this CPU.'. Enabling option 'UseSHA3Intrinsics' should not be possible and should result in a warning if -XX:-UseSHA was passed to JVM ` ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3617772383 From duke at openjdk.org Fri Dec 5 17:24:57 2025 From: duke at openjdk.org (ExE Boss) Date: Fri, 5 Dec 2025 17:24:57 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v4] In-Reply-To: References: <82Ddhg3yXemMeyKmZUCWZIPUVOTkdCbXiOcl8LO_Su0=.47680bc7-526d-4c15-9b84-dd9c7d27728d@github.com> Message-ID: On Mon, 1 Dec 2025 22:46:10 GMT, Vladimir Ivanov wrote: >> What?I?meant was?where the?`instanceof` is?in the?called?method, the `testInstanceOfCondPre` all?have the?`instanceof`?checks as?part of?the?`if`?statement. >> >> -------------------------------------------------------------------------------- >> >> Something?like: >> >> static void testInstanceOfCondDefaultInlinePre(A a, boolean cond) { >> if (defaultInlineInstanceOfCondPre(a, cond)) { >> a.m(); >> } >> } >> static void testInstanceOfCondDefaultInlinePost(A a, boolean cond) { >> if (defaultInlineInstanceOfCondPost(a, cond)) { >> a.m(); >> } >> } >> >> static void testIsInstanceCondDefaultInlinePre(A a, boolean cond) { >> if (defaultInlineIsInstanceCondPre(a, cond)) { >> a.m(); >> } >> } >> static void testIsInstanceCondDefaultInlinePost(A a, boolean cond) { >> if (defaultInlineIsInstanceCondPost(a, cond)) { >> a.m(); >> } >> } >> >> >> -------------------------------------------------------------------------------- >> >> I?suggest adding?such a?test because?of real?world?code which?use?different internal?implementation classes but?expose their?public?API as?only a?single common?supertype, like?`java.lang.constant.ClassDesc` and?its?`isPrimitive()`/`isArray()`/`isClassOrInterface()` methods (which?currently don?t do?the?`instanceof`?check, but?they probably?should so?that they?can be?reliably?inlined). > > The test is intended as a white-box test. It focuses on bytecode shapes which result in different IR representations and exercise different optimizations. From compiler perspective, there's no difference between `if (defaultInlineInstanceOfCond(a)) { ... }` and `if (a instanceof B) {...}` when inlining happens during parsing. Both test cases produce the very same IR after parsing is over. It?might be?useful to?have these?tests in?case the?default?inlining IR?changes in?the?future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2593410927 From rsunderbabu at openjdk.org Fri Dec 5 17:35:15 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Fri, 5 Dec 2025 17:35:15 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v3] In-Reply-To: References: Message-ID: > Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. > MD5 > SHA1 > SHA256 > SHA3 > > Testing: > All flag combinations from CI > hotspot tiers 1 to 5 > PS: only for tier testings, mac-aarch was skipped due to resource constraints Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: Fix TestUseSHA3IntrinsicsOptionOnSupportedCPU ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28634/files - new: https://git.openjdk.org/jdk/pull/28634/files/654604b9..8982a058 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28634&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28634&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28634/head:pull/28634 PR: https://git.openjdk.org/jdk/pull/28634 From rsunderbabu at openjdk.org Fri Dec 5 17:35:16 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Fri, 5 Dec 2025 17:35:16 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v2] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 17:11:13 GMT, Ramkumar Sunderbabu wrote: >> Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. >> MD5 >> SHA1 >> SHA256 >> SHA3 >> >> Testing: >> All flag combinations from CI >> hotspot tiers 1 to 5 >> PS: only for tier testings, mac-aarch was skipped due to resource constraints > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > remove requires condition With `@requires os.arch == "aarch64"`, TestUseSHA3IntrinsicsOptionOnSupportedCPU is working. However, I don't understand why IntrinsicPredicates.isSHA3IntrinsicAvailable() is not enough in some cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3617830847 From cjplummer at openjdk.org Fri Dec 5 20:38:58 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 5 Dec 2025 20:38:58 GMT Subject: RFR: 8370846: Support execution of mlvm testing with test thread factory [v2] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 22:35:34 GMT, Leonid Mesnik wrote: >> The MainWrapper used test thread factory has generated lambda method. So the AbsentInformationException is expected. The actual source path is not checked. >> >> Tested by run mlvm tests with and without test thread factory. >> >> Also >> jdk/test/lib/thread/TestThreadFactory.java >> updated to provide TestThreadFactory. isTestThreadFactorySet() >> that could be used by tests instead of checking property "test.thread.factory" directly. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > improved comment Looks good. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28028#pullrequestreview-3546371263 From lmesnik at openjdk.org Fri Dec 5 21:23:08 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 5 Dec 2025 21:23:08 GMT Subject: Integrated: 8370846: Support execution of mlvm testing with test thread factory In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 21:06:29 GMT, Leonid Mesnik wrote: > The MainWrapper used test thread factory has generated lambda method. So the AbsentInformationException is expected. The actual source path is not checked. > > Tested by run mlvm tests with and without test thread factory. > > Also > jdk/test/lib/thread/TestThreadFactory.java > updated to provide TestThreadFactory. isTestThreadFactorySet() > that could be used by tests instead of checking property "test.thread.factory" directly. This pull request has now been integrated. Changeset: 2596608b Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/2596608ba1bb1b271dfa062bf732a5095e22fffd Stats: 37 lines in 2 files changed: 32 ins; 0 del; 5 mod 8370846: Support execution of mlvm testing with test thread factory Reviewed-by: cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/28028 From vlivanov at openjdk.org Sat Dec 6 01:10:57 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 6 Dec 2025 01:10:57 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: References: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> Message-ID: <1SgxHBiy8F3cswUdwmWr_gtjxdSiZ3K-JQUZvCcT4hY=.d35cdbdb-ec84-47a7-8302-f3759c2b020f@github.com> On Wed, 3 Dec 2025 02:19:46 GMT, Dean Long wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Test fix > > src/hotspot/share/opto/parse2.cpp line 1737: > >> 1735: (*cast_type) = tcon->isa_klassptr()->as_instance_type(); >> 1736: return true; // found >> 1737: } > > The old code checked klass_is_exact() for this case, but the new code does not, so was it redundant, given we have a constant? Yes, the check is redundant. Moreover, I tested the patch having the check turned into an assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2594314864 From vlivanov at openjdk.org Sat Dec 6 01:15:55 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 6 Dec 2025 01:15:55 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v4] In-Reply-To: <4abJMXdHzqKGqU58EXHaXO7849B0a64NoShEvU110I4=.87a93e5c-b73e-4c6e-b85b-8797eea8814d@github.com> References: <4abJMXdHzqKGqU58EXHaXO7849B0a64NoShEvU110I4=.87a93e5c-b73e-4c6e-b85b-8797eea8814d@github.com> Message-ID: <7tncm6HgyrCXyN7VAYAoo4e0igls2GofazYW-4PyzMg=.ce6e21b1-b2ac-4482-b661-b69cb3aa22f7@github.com> On Wed, 3 Dec 2025 10:40:52 GMT, Quan Anh Mai wrote: >> Yes, it's a good idea and the right direction to move. While experimenting with a different enhancement, I noticed that a subtype check leaves a null check behind irrespective of whether the check goes away or not. >> >> Unfortunately, there are some engineering considerations which complicates the change. `SubTypeCheck` is shared across all the places where subtype checks are performed, but `checkcast` and `instanceof` differ in the way `null` is handled. So, the proper way to fix it is to introduce a higher-level representation which implicitly handles nulls and then eventually lower it to `SubTypeCheck` and materialize null check if needed. > > There are multiple ways without having to have yet another higher-level representation. The first one is that since `SubTypeCheck` does not accept `null` now, we can just choose one result for `null`. Choosing the `instanceof` approach may be a little more desirable, as it removes the need to perform this complicated match, and for `checkcast` we can manually insert a `CheckCastPP` anyway. Another solution is to have another input to `SubTypeCheck` which gives the result when the `obj` is `null`. On a whim, I kind of like this, as we can match both the `checkcast` and the `instanceof` pattern here, it also simplifies `GraphKit::gen_checkcast`, as we do not have to worry about "the cast that always succeeds will leave behind a null check". > > Just a suggestion, though. This PR is fine as it is to me. I agree it can be implemented without introducing new fancy IR nodes. The open question to me though is whether we can live without materializing null check until `SubTypeCheck` nodes are macro expanded. Otherwise, it'll turn into a gradual lowering though different representations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2594317538 From vlivanov at openjdk.org Sat Dec 6 01:47:56 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 6 Dec 2025 01:47:56 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v4] In-Reply-To: References: <82Ddhg3yXemMeyKmZUCWZIPUVOTkdCbXiOcl8LO_Su0=.47680bc7-526d-4c15-9b84-dd9c7d27728d@github.com> Message-ID: On Fri, 5 Dec 2025 17:22:19 GMT, ExE Boss wrote: >> The test is intended as a white-box test. It focuses on bytecode shapes which result in different IR representations and exercise different optimizations. From compiler perspective, there's no difference between `if (defaultInlineInstanceOfCond(a)) { ... }` and `if (a instanceof B) {...}` when inlining happens during parsing. Both test cases produce the very same IR after parsing is over. > > It?might be?useful to?have these?tests in?case the?default?inlining IR?changes in?the?future. It's intended as a unit test. It's better to catch inlining issue with targeted tests on inlining. From compiler perspective, there's no reason to cover other cases here. There are so many different scenarios how a subtype check can show up in IR. And different scenarios can theoretically fail due to different reasons. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2594354870 From duke at openjdk.org Sat Dec 6 09:46:14 2025 From: duke at openjdk.org (duke) Date: Sat, 6 Dec 2025 09:46:14 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v18] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 08:57:06 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - Merge branch 'master' into enhance-clz-type > - Make code more compact > - Fix include order > - Merge branch 'master' into enhance-clz-type > - Merge branch 'master' into enhance-clz-type > - Fix constant fold > - Remove redundant import > - Add random range tests > - Add more comments to IR test > - Add more constant folding tests for CLZ/CTZ > - ... and 14 more: https://git.openjdk.org/jdk/compare/674cc3ee...f0687754 @MaxXSoft Your change (at version f0687754fca4ce08f650bb49c6e96ebb0d5b99bf) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3619817305 From qxing at openjdk.org Sat Dec 6 09:46:05 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Sat, 6 Dec 2025 09:46:05 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise In-Reply-To: References: <1yjEG7xjZcmvAECD2ovS0pW8IwA30p9BzCr0Krgy4ks=.3224b13e-a32f-468d-a6e3-3fb5a1c35c04@github.com> Message-ID: <3TUVz1ty0yJCxN2OsLPpH4znP6QtR3hQRyd9r8-TA08=.1cfc007e-bfce-491a-8107-577ee3e7af15@github.com> On Fri, 5 Dec 2025 09:10:37 GMT, Emanuel Peter wrote: >> @MaxXSoft Yes you are right, my mistake > > You do need 2 reviewers. I see that @merykitty has reviewed this a while ago, but a re-approval from him would be good since it is so long ago now. > > I'll run some testing now. @eme64 @merykitty Thanks for the re-review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3619814845 From zlin at openjdk.org Sat Dec 6 12:07:04 2025 From: zlin at openjdk.org (Zihao Lin) Date: Sat, 6 Dec 2025 12:07:04 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v15] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Merge branch 'master' into 8344116 - Merge branch 'master' into 8344116 - remove adr_type from graphKit - Fix test failed - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - fix conflict - Merge master - remove C2AccessValuePtr - fix assert - ... and 8 more: https://git.openjdk.org/jdk/compare/b0f59f60...c526f021 ------------- Changes: https://git.openjdk.org/jdk/pull/24258/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=14 Stats: 316 lines in 22 files changed: 47 ins; 89 del; 180 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From jkarthikeyan at openjdk.org Sat Dec 6 20:27:04 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Sat, 6 Dec 2025 20:27:04 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: <8jnY6pqofieRIfV5fCqFxvHZMF3nAZbh7yAD7C_G5FU=.a12c98f6-c715-43f5-9528-62fcfdfc6e59@github.com> References: <1z430wmE_HRTJqmLIC15VMUktLyUEE7qjkppr1GniAI=.e560a4e9-59f0-4013-ad65-5d7261cdbf0e@github.com> <8jnY6pqofieRIfV5fCqFxvHZMF3nAZbh7yAD7C_G5FU=.a12c98f6-c715-43f5-9528-62fcfdfc6e59@github.com> Message-ID: On Fri, 5 Dec 2025 08:23:02 GMT, Emanuel Peter wrote: >> After taking a closer look, I think you're correct- I can reproduce the crash using just `@Warmup(0)` and `@Test`. I think I used both while debugging and didn't test whether it worked without `CompLevel.C2`. I've removed it in the latest commit. >> However, I noticed that after that I merged from master neither the test nor the reproducer failed compilation before the fix is added. I think another commit must have changed the generated graph so that it no longer tries to vectorize the `CastII`, leading to the crash not being triggered. I looked at the JBS entry and saw that there wasn't another reproducer for this, so I was a bit unsure on what to do. Should this patch be merged with the current test? > > @jaskarth Thanks for looking into it! > > I would still add the fix, just in case. And I think the test as well, even if it does not reproduce any more. > > I was wondering: before the merge, when the test still reproduced: > If you removed the `@Warmup(0)` and `CompLevel.C2`, and instead just do `framework.addFlags` with `-Xcomp`, would that reproduce too? If so, you could have a framework run with and one without Xcomp, the one with Xcomp also should have a compileonly. What do you think? > > Or we just push the patch as is, to be sure this is done and integrated. What do you think @chhagedorn ? Yep, I can replicate the crash on the old commit with `TestFramework.runWithFlags("-Xcomp", "-XX:CompileCommand=compileonly,*TestSubwordTruncation::*");` instead of `@Warmup(0)`. I think this would also be a good option, as it would let you get coverage with Xcomp on the other tests as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2595422874 From qamai at openjdk.org Sun Dec 7 12:08:18 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Dec 2025 12:08:18 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v7] In-Reply-To: References: Message-ID: > Hi, > > This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge branch 'master' into andorxor - Merge branch 'master' into andorxor - Merge branch 'master' into andorxor - Add assertion for the helper in CTPComparator Co-authored-by: Emanuel Peter - remove std::hash - remove unordered_map, add some comments for all_instances_size - Emanuel's reviews - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences ------------- Changes: https://git.openjdk.org/jdk/pull/27618/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27618&range=06 Stats: 964 lines in 9 files changed: 630 ins; 313 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/27618.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27618/head:pull/27618 PR: https://git.openjdk.org/jdk/pull/27618 From qamai at openjdk.org Sun Dec 7 12:12:10 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Dec 2025 12:12:10 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v5] In-Reply-To: References: Message-ID: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into typejoin - Merge branch 'master' into typejoin - Move dual to ASSERT only - Keep old version for verification - whitespace - Reimplement Type::join ------------- Changes: https://git.openjdk.org/jdk/pull/28051/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=04 Stats: 1885 lines in 7 files changed: 1013 ins; 479 del; 393 mod Patch: https://git.openjdk.org/jdk/pull/28051.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28051/head:pull/28051 PR: https://git.openjdk.org/jdk/pull/28051 From qamai at openjdk.org Sun Dec 7 19:23:12 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Dec 2025 19:23:12 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v6] In-Reply-To: References: Message-ID: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: sort order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28051/files - new: https://git.openjdk.org/jdk/pull/28051/files/c3b5d453..7d882903 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=04-05 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28051.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28051/head:pull/28051 PR: https://git.openjdk.org/jdk/pull/28051 From fyang at openjdk.org Mon Dec 8 01:25:06 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 8 Dec 2025 01:25:06 GMT Subject: RFR: 8371968: RISC-V: implement AES CBC intrinsics [v3] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 03:24:31 GMT, Anjian Wen wrote: >> Support AES CBC intrinsic on RISCV, Already passed the tests in >> test/hotspot/jtreg/compiler/codegen/aes/ >> test/jdk/com/sun/crypto > > Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Modify some assert > - RISC-V: implement AES CBC intrinsics Thanks for the update. Two comments remain. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2675: > 2673: __ mv(x10, input_len); > 2674: __ leave(); > 2675: __ ret(); Similar here. Consider introduce a subroutine to remove the duplicate code for the three cases. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2786: > 2784: __ mv(x10, input_len); > 2785: __ leave(); > 2786: __ ret(); Since the three cases here duplicate most of the code, seems better to introduce subroutine to simply the code. I think that will be similar as what you do for the CTR AES intrinsic where we have this subroutine `counterMode_AESCrypt` which is called by `generate_counterMode_AESCrypt`. ------------- PR Review: https://git.openjdk.org/jdk/pull/28320#pullrequestreview-3549811847 PR Review Comment: https://git.openjdk.org/jdk/pull/28320#discussion_r2596785380 PR Review Comment: https://git.openjdk.org/jdk/pull/28320#discussion_r2596784081 From xgong at openjdk.org Mon Dec 8 02:01:06 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 8 Dec 2025 02:01:06 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE [v2] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 09:37:22 GMT, Xiaohong Gong wrote: >> **Problem:** >> >> This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: >> >> >> Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> const TypeVect* vt = vect_type(); >> if (Matcher::vector_needs_partial_operations(this, vt)) { >> return VectorNode::try_to_gen_masked_vector(phase, this, vt); >> } >> return LoadNode::Ideal(phase, can_reshape); >> } >> >> >> The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. >> >> This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Solution:** >> >> This patch addresses the issue through two changes: >> >> 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. >> 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Testing:** >> >> - Verified on different SVE platforms with different vector sizes (128|256|512 bits). >> - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). >> - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Combine the condition check and IR transformation to a method Hi @erifan @shqking , could you please help take a look at this PR? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28651#issuecomment-3624159509 From xgong at openjdk.org Mon Dec 8 02:03:25 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 8 Dec 2025 02:03:25 GMT Subject: RFR: 8372136: VectorAPI: Refactor subword gather load API java implementation In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 01:42:07 GMT, Xiaohong Gong wrote: > The current subword (`byte`/`short`) gather load API implementation is not well-suited for platforms that provide native vector instructions for these operations. As **discussed in PR [1]**, we'd like to re-implement these APIs with a **unified cross-platform** solution. > > The main idea is to re-implement the API at Java-level, by performing multiple sub-gather operations. Each sub-gather operation loads a portion of elements using a specific index vector by calling the HotSpot intrinsic API. The partial results are then merged using vector `slice` and `or` operations. This design simplifies the VM compiler intrinsic implementation and better aligns with the Vector API design principles. > > Key changes: > 1. Re-implement the subword gather load API at the Java level. The HotSpot intrinsic `VectorSupport.loadWithMap` is simplified by reducing the vector index parameters from four (vix1-vix4) to a single parameter. > 2. Adjust the compiler intrinsic implementation to support the new Java API, including updates to the x86 backend implementation. > > The performance impact varies across different scenarios on X86. I tested the performance with different AVX levels on an X86 machine that supports AVX512. To achieve optimal performance, I also **applied PR [2]**, which improves the performance of the **`slice()`** API on X86. Following is the summarized performance gains, where: > > - "non masked" means the gather operation is not the masked gather API. > - "masked" means the gather operation is the masked gather API. > - "1 gather cases" means the gather API is implemented with a single gather operation. E.g. Load `Short128Vector` with `MaxVectorSize=256`. > - "2 gather cases" means the gather API is implemented with 2 parts of gather operations. E.g. Load `Short256Vector` with `MaxVectorSize=256`. > - "4 gather cases" means the gather API is implemented with 4 parts of gather operations. E.g. Load `Byte256Vector` with `MaxVectorSize=256`. > - "Un-intrinsified" means the gather operation is not supported to be intrinsified by hotspot. E.g. Load `Byte512Vector` with `MaxVectorSize=256`. The singificant performance uplifts comes from the Java-level changes which removes the vector index generation and range checks for such cases. > > > ---------------------------------------------------------------------------- > | UseAVX=3 | UseAVX=2 | > |-----------------------------|-----------------------------| > | non maske... ping ------------- PR Comment: https://git.openjdk.org/jdk/pull/28520#issuecomment-3624161715 From liach at openjdk.org Mon Dec 8 02:08:37 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 8 Dec 2025 02:08:37 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v8] In-Reply-To: References: Message-ID: > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Bugs and verify loader leak - Try to avoid loader leak - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache - Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure - Test from Jorn - Copyright years - Fix problem identified by Jorn - Rollback getAndAdd for now - Redundant change - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache - ... and 5 more: https://git.openjdk.org/jdk/compare/295928d3...eebb8ff7 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/8200fb28..eebb8ff7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=06-07 Stats: 10547 lines in 389 files changed: 7296 ins; 2024 del; 1227 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From liach at openjdk.org Mon Dec 8 02:08:37 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 8 Dec 2025 02:08:37 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v6] In-Reply-To: <8tu3HIArCw2cdoYR2SjI0b-TWYQxQLKkjQgucJEj8D4=.10946ec2-4958-48df-add4-b29d11c09448@github.com> References: <8tu3HIArCw2cdoYR2SjI0b-TWYQxQLKkjQgucJEj8D4=.10946ec2-4958-48df-add4-b29d11c09448@github.com> Message-ID: On Thu, 4 Dec 2025 01:55:57 GMT, Vladimir Ivanov wrote: >> I don't think we can use a SoftReference here if we need to achieve constant folding. >> >> Looking at inline_reference_get0, I think we might introduce another field property to trust a reference (potentially in an array) if both that reference and the referent within the reference is non-null. I think that belongs to a separate RFE. What do you think? > > Then it makes sense to limit the caching to safe cases only for now. Otherwise, it would functionally regress due to a possible memory leak. I have added a primitive checking system to ensure safety for most cases and added a unit test to ensure there is no memory leak due to this cache. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2596829153 From liach at openjdk.org Mon Dec 8 02:08:37 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 8 Dec 2025 02:08:37 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0r bfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> <5CADH75ZjadKttOKwsykRFUPlQKLiwCW8E5WkM_75a4=.fd992c8f-e8bc-4775-9ea3-d5212664e3df@github.com> <5QPAetQEkrBgFKtMt0i9Ku_4s2GCirMl2uqLH3j8x7g=.e5fc8964-0080-45f7-9005-31922ec06ba1@github.com> Message-ID: On Wed, 3 Dec 2025 13:34:40 GMT, Jorn Vernee wrote: >> Looking at this, I'm not sure we can assume that we only see one mode and type when the VH is constant. There seems to be a lot of non-local reasoning involved. >> >> For example, you could have a var handle invoker created with `MethodHandless::varHandleInvoker`, which is cached, so the `AccessDescriptor` can be shared among many different use sites. For an individual use-site, the receiver VH may well be a constant, but that doesn't mean that the cache isn't polluted by the var handle from another use site, as far as I can tell. >> >> The thread safety issue comes from a C2 thread racing to read the `lastAdaption` cache vs another Java thread writing to the cache. AFAICS, this race is still possible even when `vh` is a compile time constant. > > I think even without using an invoker, you could end up in a similar situation if you have something like: > > > static Object m(VarHandle vh) { > return vh.get(); > } > > > Which is called by several different threads. At some point this method may be inlined into one of its callees, where `vh` then becomes a constant. But at the same time, other threads are still writing to the cache. This issue should have been fixed, and there's a unit test to verify. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2596829589 From liach at openjdk.org Mon Dec 8 02:08:38 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 8 Dec 2025 02:08:38 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v7] In-Reply-To: References: <7ayMTZ4nXMyB1SXNRcYGjdxidNHDcAUNv_8fQZDUaPI=.a558d3a2-1d3e-4b45-8ba7-393c55a52785@github.com> Message-ID: On Fri, 5 Dec 2025 11:03:18 GMT, Jorn Vernee wrote: >> Chen Liang has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure > > test/hotspot/jtreg/compiler/c2/irTests/constantFold/VarHandleMismatchedTypeFold.java line 48: > >> 46: public static void main(String[] args) { >> 47: TestFramework.runWithFlags( >> 48: "-XX:+UnlockExperimentalVMOptions" > > Why is this flag needed? Indeed, simplified this to `TestFramework.run()` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2596827495 From erfang at openjdk.org Mon Dec 8 02:15:02 2025 From: erfang at openjdk.org (Eric Fang) Date: Mon, 8 Dec 2025 02:15:02 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE [v2] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 09:37:22 GMT, Xiaohong Gong wrote: >> **Problem:** >> >> This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: >> >> >> Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> const TypeVect* vt = vect_type(); >> if (Matcher::vector_needs_partial_operations(this, vt)) { >> return VectorNode::try_to_gen_masked_vector(phase, this, vt); >> } >> return LoadNode::Ideal(phase, can_reshape); >> } >> >> >> The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. >> >> This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Solution:** >> >> This patch addresses the issue through two changes: >> >> 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. >> 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Testing:** >> >> - Verified on different SVE platforms with different vector sizes (128|256|512 bits). >> - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). >> - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Combine the condition check and IR transformation to a method LGTM, thanks for the fix. ------------- Marked as reviewed by erfang (Author). PR Review: https://git.openjdk.org/jdk/pull/28651#pullrequestreview-3549875773 From jbhateja at openjdk.org Mon Dec 8 03:33:05 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 8 Dec 2025 03:33:05 GMT Subject: RFR: 8368977: Provide clear naming for AVX10 identifiers In-Reply-To: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> References: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> Message-ID: On Mon, 17 Nov 2025 03:46:50 GMT, Mohamed Issa wrote: > This is a simple change that renames all AVX10 identifiers to explicitly show which sub-versions are in use. Right now, AVX10.2 is the only case to worry about. Also, the new AVX10.2 floating point conversion instructions are now used whenever possible to satisfy any related bytecode cast operations. The JTREG tests listed below were used to verify correctness with the recommended JVM options mentioned in corresponding source files. Each test included runs through emulation with AVX10.2 enabled and disabled to exercise all possible paths. All modifications and tests used [OpenJDK v26-b24](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B24) as the baseline build. > > 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` > 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` > 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` > 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` > 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` > 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` > 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` > 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` > 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java` > 10. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java` > 11. `jtreg:test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java` > 12. `jtreg:test/jdk/jdk/incubator/vector/Double64VectorTests.java` > 13. `jtreg:test/jdk/jdk/incubator/vector/Double128VectorTests.java` > 14. `jtreg:test/jdk/jdk/incubator/vector/Double256VectorTests.java` > 15. `jtreg:test/jdk/jdk/incubator/vector/Double512VectorTests.java` > 16. `jtreg:test/jdk/jdk/incubator/vector/DoubleMaxVectorTests.java` > 17. `jtreg:test/jdk/jdk/incubator/vector/Float64VectorTests.java` > 18. `jtreg:test/jdk/jdk/incubator/vector/Float128VectorTests.java` > 19. `jtreg:test/jdk/jdk/incubator/vector/Float256VectorTests.java` > 20. `jtreg:test/jdk/jdk/incubator/vector/Float512VectorTests.java` > 21. `jtreg:test/jdk/jdk/incubator/vector/FloatMaxVectorTests.java` > 22. `jtreg:test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` > 23. `jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java` src/hotspot/cpu/x86/templateTable_x86.cpp line 1605: > 1603: __ jcc(Assembler::notEqual, L); > 1604: __ call_VM_leaf(CAST_FROM_FN_PTR(address, SharedRuntime::f2i), 1); > 1605: } This change should be part of a seperate PR. src/hotspot/cpu/x86/templateTable_x86.cpp line 1620: > 1618: __ jcc(Assembler::notEqual, L); > 1619: __ call_VM_leaf(CAST_FROM_FN_PTR(address, SharedRuntime::f2l), 1); > 1620: } Please restrict this PR to name change related changes only. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28344#discussion_r2596923804 PR Review Comment: https://git.openjdk.org/jdk/pull/28344#discussion_r2596925402 From erfang at openjdk.org Mon Dec 8 03:35:50 2025 From: erfang at openjdk.org (Eric Fang) Date: Mon, 8 Dec 2025 03:35:50 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations Message-ID: This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance. Changes: -------- 1. C2 mid-end: - Added UMinReductionVNode and UMaxReductionVNode 2. AArch64 Backend: - Added uminp/umaxp/sve_uminv/sve_umaxv instructions - Updated match rules for all vector sizes and element types - Both NEON and SVE implementation are supported 3. Test: - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java - Added assembly tests in aarch64-asmtest.py for new instructions - Added a JTReg test file VectorUMinMaxReductionTest.java Different configurations were tested on aarch64 and x86 machines, and all tests passed. Test results of JMH benchmarks from the panama-vector project: -------- On a Nvidia Grace machine with 128-bit SVE: Benchmark Unit Before Error After Error Uplift Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 Short128Vector.UMAXMaskedLanes ops/ms 308.90 351.78 15155.26 31.03 49.06 Short64Vector.UMAXLanes ops/ms 190.38 245.09 8022.46 14.30 42.14 Short64Vector.UMAXMaskedLanes ops/ms 195.54 36.15 7930.28 11.88 40.56 On a Nvidia Grace machine with 128-bit NEON: Benchmark Unit Before Error After Error Uplift Byte128Vector.UMAXLanes ops/ms 414.69 42.52 25257.61 25.91 60.91 Byte128Vector.UMAXMaskedLanes ops/ms 552.00 56.61 23063.14 304.45 41.78 Byte128Vector.UMINLanes ops/ms 634.98 849.04 28444.37 180.80 44.80 Byte128Vector.UMINMaskedLanes ops/ms 612.88 735.18 26127.07 27.99 42.63 Byte64Vector.UMAXLanes ops/ms 291.53 32.19 13893.62 28.09 47.66 Byte64Vector.UMAXMaskedLanes ops/ms 363.34 48.17 13290.59 12.53 36.58 Byte64Vector.UMINLanes ops/ms 368.70 433.60 15416.90 15.80 41.81 Byte64Vector.UMINMaskedLanes ops/ms 350.46 371.05 14524.29 121.63 41.44 Int128Vector.UMAXLanes ops/ms 177.67 201.38 10182.82 20.21 57.31 Int128Vector.UMAXMaskedLanes ops/ms 155.25 187.88 9194.13 393.35 59.22 Int64Vector.UMAXLanes ops/ms 93.93 115.02 5106.79 4.54 54.37 Int64Vector.UMAXMaskedLanes ops/ms 87.01 88.50 4405.87 8.06 50.63 Long128Vector.UMAXLanes ops/ms 80.32 98.50 3229.80 40.53 40.21 Long128Vector.UMAXMaskedLanes ops/ms 77.65 103.25 3161.50 4.45 40.72 Long64Vector.UMAXLanes ops/ms 47.72 65.38 46.41 50.38 0.97 Long64Vector.UMAXMaskedLanes ops/ms 45.26 47.46 45.13 47.23 1.00 Short128Vector.UMAXLanes ops/ms 316.09 429.34 14748.07 14.78 46.66 Short128Vector.UMAXMaskedLanes ops/ms 307.70 342.54 14359.11 44.99 46.67 Short64Vector.UMAXLanes ops/ms 187.67 253.01 8180.63 178.65 43.59 Short64Vector.UMAXMaskedLanes ops/ms 191.10 33.51 7949.19 108.65 41.60 ------------- Commit messages: - 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations - 8372978: [VectorAPI] Fix incorrect identity values in UMIN/UMAX reductions Changes: https://git.openjdk.org/jdk/pull/28693/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28693&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372980 Stats: 1607 lines in 49 files changed: 835 ins; 16 del; 756 mod Patch: https://git.openjdk.org/jdk/pull/28693.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28693/head:pull/28693 PR: https://git.openjdk.org/jdk/pull/28693 From haosun at openjdk.org Mon Dec 8 04:59:59 2025 From: haosun at openjdk.org (Hao Sun) Date: Mon, 8 Dec 2025 04:59:59 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE [v2] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 09:37:22 GMT, Xiaohong Gong wrote: >> **Problem:** >> >> This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: >> >> >> Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> const TypeVect* vt = vect_type(); >> if (Matcher::vector_needs_partial_operations(this, vt)) { >> return VectorNode::try_to_gen_masked_vector(phase, this, vt); >> } >> return LoadNode::Ideal(phase, can_reshape); >> } >> >> >> The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. >> >> This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Solution:** >> >> This patch addresses the issue through two changes: >> >> 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. >> 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Testing:** >> >> - Verified on different SVE platforms with different vector sizes (128|256|512 bits). >> - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). >> - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Combine the condition check and IR transformation to a method Thanks for your work. LGTM. ------------- Marked as reviewed by haosun (Committer). PR Review: https://git.openjdk.org/jdk/pull/28651#pullrequestreview-3550120477 From qamai at openjdk.org Mon Dec 8 07:41:01 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 8 Dec 2025 07:41:01 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <5DHx3WmMb1UtSeyiEiYCiisVgRFggPFfxBggpgtuD6M=.d72a9c07-9624-47ea-9398-a0d1dee69755@github.com> References: <5DHx3WmMb1UtSeyiEiYCiisVgRFggPFfxBggpgtuD6M=.d72a9c07-9624-47ea-9398-a0d1dee69755@github.com> Message-ID: On Fri, 5 Dec 2025 14:02:14 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/castnode.hpp line 105: >> >>> 103: // All the possible combinations of floating/narrowing with example use cases: >>> 104: >>> 105: // Use case example: Range Check CastII >> >> I believe this is incorrect, a range check should be floating non-narrowing. It is only narrowing if the length of the array is a constant. It is because this cast encodes the dependency on the condition `index u< length`. This condition cannot be expressed in terms of `Type` unless `length` is a constant. > > Range check `CastII` were added to protect the `ConvI2L` in the address expression on 64 bits. The problem there was, in some cases, that the `ConvI2L` would float above the range check (because `ConvI2L` has no control input) and could end up with an out of range input (which in turn would cause the `ConvI2L` to become `top` in places where it wasn't expected). > So `CastII` doesn't carry the control dependency of an array access on its range check. That dependency is carried by the `MemNode` which has its control input set to the range check. > What you're saying, if I understand it correctly, would be true if the `CastII` was required to prevent an array `Load` from floating. But that's not the case. Got it, sorry I misunderstood! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2597364668 From qamai at openjdk.org Mon Dec 8 07:48:05 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 8 Dec 2025 07:48:05 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 14:05:06 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3550550450 From duke at openjdk.org Mon Dec 8 08:15:00 2025 From: duke at openjdk.org (ExE Boss) Date: Mon, 8 Dec 2025 08:15:00 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v8] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 02:08:37 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Bugs and verify loader leak > - Try to avoid loader leak > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure > - Test from Jorn > - Copyright years > - Fix problem identified by Jorn > - Rollback getAndAdd for now > - Redundant change > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - ... and 5 more: https://git.openjdk.org/jdk/compare/17ebc456...eebb8ff7 make/jdk/src/classes/build/tools/methodhandle/VarHandleGuardMethodGenerator.java line 145: > 143: MethodHandle.linkToStatic(); > 144: } else { > 145: ad.adaptedMethodHandle(handle).invokeBasic(); The?old?version didn?t?use?`` in?`GUARD_METHOD_TEMPLATE_V`: Suggestion: ad.adaptedMethodHandle(handle).invokeBasic(); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2597461675 From epeter at openjdk.org Mon Dec 8 09:30:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Dec 2025 09:30:17 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v27] In-Reply-To: References: Message-ID: On Sat, 15 Nov 2025 02:28:55 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > IR test cases Nice, thanks for adding some IR tests! We can still add more down the line, but this is a start. I'll look into the whole PR again this week ;) test/hotspot/jtreg/compiler/c2/TestReachabilityFence.java line 200: > 198: @IR(counts = {IRNode.REACHABILITY_FENCE, "2"}, phase = CompilePhase.AFTER_LOOP_OPTS) > 199: @IR(counts = {IRNode.REACHABILITY_FENCE, "0"}, phase = CompilePhase.EXPAND_REACHABILITY_FENCES) > 200: @IR(counts = {IRNode.REACHABILITY_FENCE, "1"}, phase = CompilePhase.FINAL_CODE) Can you add a small comment here, why we go from 2 -> 0 -> 1 ? Is it because we eliminate one of the two RF? Which one is supposed to be eliminated? ------------- PR Review: https://git.openjdk.org/jdk/pull/25315#pullrequestreview-3551010955 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2597773179 From lucy at openjdk.org Mon Dec 8 09:41:57 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 8 Dec 2025 09:41:57 GMT Subject: RFR: 8372641: [s390x] Test failure TestMergeStores.java [v3] In-Reply-To: <6iaWuz5X4ol8NmIvbWoQBxmceux35b3529t1sONwCZA=.08c49f3a-87dc-4030-a5a7-1a83f4209fe0@github.com> References: <6iaWuz5X4ol8NmIvbWoQBxmceux35b3529t1sONwCZA=.08c49f3a-87dc-4030-a5a7-1a83f4209fe0@github.com> Message-ID: On Thu, 27 Nov 2025 08:59:09 GMT, Harshit470250 wrote: >> [JDK-8347405](https://bugs.openjdk.org/browse/JDK-8347405) introduced a mergeStores optimisation which requires ReverseBytesS opcode and as it was not implemented for s390 the test case is failing. >> I also implemented ReverseBytesUS. > > Harshit470250 has updated the pull request incrementally with one additional commit since the last revision: > > Added whitespace LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28523#pullrequestreview-3551079689 From bkilambi at openjdk.org Mon Dec 8 10:38:12 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 8 Dec 2025 10:38:12 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v5] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 11:34:11 GMT, Jatin Bhateja wrote: >> Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. >> - Add necessary inline expander support. >> - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. >> - Use existing Float16 vector IR and backend support. >> - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. >> >> The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). >> >> The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. >> >> image >> >> Initial RFP[1] was floated on the panama-dev mailing list. >> >> Kindly review the draft PR and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Cleanups test/hotspot/jtreg/compiler/vectorapi/TestFloat16VectorOperations.java line 134: > 132: .intoArray(output, i); > 133: } > 134: for (; i < LEN; i++) { Will this not result in autovectorization instead and also overwrite the `output` array results from vectorapi which were previously computed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2598040108 From epeter at openjdk.org Mon Dec 8 10:38:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Dec 2025 10:38:29 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v27] In-Reply-To: References: Message-ID: <9l8uhFDyx0QQeTcUw0TclysqlxwvdndXDbRL8rjX8GQ=.0429ab53-df55-4580-914e-a8a4242f00ac@github.com> On Sat, 15 Nov 2025 02:28:55 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > IR test cases @iwanowww I think this looks good to me now. Thanks for working on this! Of course it is not perfect yet. There are a lot of follow?up RFE's filed. But it is already a larger patch, and it seems to be a step in the right direction. One that I think we should not let sit very long - and possibly could even fix before integrating the change here: [JDK-8370133](https://bugs.openjdk.org/browse/JDK-8370133) C2: Manage non-debug safepoint edges in structural manner There are now a few places where you have special logic to deal with other SafePoint edges, and it seems very hacky and ad-hoc. It is more code, more implicit assumptions, and probably also prone for bugs. If we are going to integrate this fix here before addressing that messy SafePoint edge code, we should clean it up really soon, preferrably in the same release cycle (JDK27). A **second reviewer** should also do a thorough review. I can't say I fully comprehend all implications here. There are some implicit assumptions that make me a bit nervous, but we probably just have to live with that - but we should make sure we are explicit about what assumptions we make and document them well. src/hotspot/share/opto/phaseX.cpp line 2051: > 2049: // java -XX:VerifyIterativeGVN=1000 -Xcomp -XX:+StressReachabilityFences > 2050: return false; > 2051: } Is this still true? src/hotspot/share/opto/reachability.cpp line 73: > 71: * > 72: * After loop opts are over, it becomes possible to reliably enumerate all interfering safe points and > 73: * to ensure that the referent is present in their oop maps. For the sake of someone else who has to fix a bug here, or considers changing the design, can you please: - Give a more concrete example, e.g: load of a native memory address is hoisted, ... explain that this means we move it over the SafePoint in the backedge, and why this is problematic. - State your assumption / invariants that should hold after loop opts, that guarantee that it is safe to now attach to SP instead of RF. This one makes me a bit nervous, because it is another implicit assumption in C2, but I suppose we just have to live with that. But at least we can document it well ;) src/hotspot/share/opto/reachability.cpp line 166: > 164: lpt->_reachability_fences = new Node_List(); > 165: } > 166: lpt->_reachability_fences->push(new_rf); This code is duplicated elsewhere. Consider refactoring it with a `lpt->reachability_fences_push` method that automatically allocates the new `Node_List`. src/hotspot/share/opto/reachability.cpp line 216: > 214: // ResourceMark rm; // NB! not safe because insert_rf may trigger _idom reallocation > 215: Unique_Node_List redundant_rfs; > 216: GrowableArray> worklist; Tech debt alarm ;) We should probably more `_idom` to a different arena then, right? src/hotspot/share/opto/reachability.cpp line 223: > 221: IdealLoopTree* lpt = get_loop(rf); > 222: Node* referent = rf->referent(); > 223: Node* loop_exit = lpt->unique_loop_exit_or_null(); Suggestion: IfFalseNode* loop_exit = lpt->unique_loop_exit_or_null(); src/hotspot/share/opto/reachability.cpp line 473: > 471: if (extra_edge != nullptr) { > 472: sfpt->add_req(extra_edge); // Add valid_length_test_input edge back > 473: } Could it be that you have two meanings for "extra edge" here? OR does the top comment: > Turn extra safepoint edges into reachability fences match with this? > sfpt->add_req(extra_edge); // Add valid_length_test_input edge back Again: mixing up these edges really feels like tech debt. We should fix that soon. test/hotspot/jtreg/compiler/c2/TestReachabilityFenceFlags.java line 48: > 46: * -XX:+StressReachabilityFences -XX:+OptimizeReachabilityFences -XX:+PreserveReachabilityFencesOnConstants > 47: * compiler.c2.TestReachabilityFenceFlags > 48: */ Conside dropping `-Xcomp -XX:-TieredCompilation`, because we will run with that at some point in our CI anyway. Would give more options for different kinds of compilation, right? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25315#pullrequestreview-3551106049 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2597856271 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2597906466 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2597925796 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2597936186 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2597939958 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2597992100 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2598006581 From epeter at openjdk.org Mon Dec 8 10:43:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Dec 2025 10:43:01 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v5] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 10:35:23 GMT, Bhavana Kilambi wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Cleanups > > test/hotspot/jtreg/compiler/vectorapi/TestFloat16VectorOperations.java line 134: > >> 132: .intoArray(output, i); >> 133: } >> 134: for (; i < LEN; i++) { > > Will this not result in autovectorization instead and also overwrite the `output` array results from vectorapi which were previously computed? Yes: there could be auto-vectorization. No: `i` is not reset, it keeps counting from where `i < SPECIES.loopBound(LEN)` fails, and handles the tail, right? It could be good to run this test once with and once without auto vectorization, just to make sure the vectors you see are from the Vector API, and not auto vectorization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2598054314 From haosun at openjdk.org Mon Dec 8 11:14:58 2025 From: haosun at openjdk.org (Hao Sun) Date: Mon, 8 Dec 2025 11:14:58 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v2] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 17:32:23 GMT, Ramkumar Sunderbabu wrote: >> Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: >> >> remove requires condition > > With `@requires os.arch == "aarch64"`, TestUseSHA3IntrinsicsOptionOnSupportedCPU is working. > However, I don't understand why IntrinsicPredicates.isSHA3IntrinsicAvailable() is not enough in some cases. Hi @rsunderbabu I'm afraid I cannot reproduce your failure in my environment. In my local test, I completely removed `@requires os.arch == "aarch64" & os.family == "mac"` and ran the test on Nvidia Grace machine(with sha3 support), one Neoverse-N1 machine(without sha3 support) and one x86_64 machine. I found that the test case `TestUseSHA3IntrinsicsOptionOnSupportedCPU.java` can pass. Another thing is that, `TestUseSHA3IntrinsicsOptionOnUnsupportedCPU.java` (note that the case is **Unsupported**) failed on Nvidia Grace machine. I checked that the initial commit and the latest one both failed. Here is the snippet of the error log java.lang.AssertionError: Option 'UseSHA3Intrinsics' is expected to have 'false' value, but is 'UseSHA3Intrinsics = t'. Option 'UseSHA3Intrinsics' should be off on unsupported CPU even if set to true directly at jdk.test.lib.cli.CommandLineOptionTest.verifyOptionValue(CommandLineOptionTest.java:312) at jdk.test.lib.cli.CommandLineOptionTest.verifyOptionValue(CommandLineOptionTest.java:282) at jdk.test.lib.cli.CommandLineOptionTest.verifyOptionValueForSameVM(CommandLineOptionTest.java:411) at compiler.intrinsics.sha.cli.testcases.GenericTestCaseForUnsupportedCPU.verifyOptionValues(GenericTestCaseForUnsupportedCPU.java:96) at compiler.intrinsics.sha.cli.DigestOptionsBase$TestCase.test(DigestOptionsBase.java:163) at compiler.intrinsics.sha.cli.DigestOptionsBase.runTestCases(DigestOptionsBase.java:139) at jdk.test.lib.cli.CommandLineOptionTest.test(CommandLineOptionTest.java:544) at compiler.intrinsics.sha.cli.TestUseSHA3IntrinsicsOptionOnUnsupportedCPU.main(TestUseSHA3IntrinsicsOptionOnUnsupportedCPU.java:50) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) at java.base/java.lang.Thread.run(Thread.java:1516) Caused by: java.lang.RuntimeException: 'UseSHA3Intrinsics\s*:?=\s*false' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:407) at jdk.test.lib.cli.CommandLineOptionTest.verifyOptionValue(CommandLineOptionTest.java:301) ... 11 more JavaTest Message: Test threw exception: java.lang.AssertionError: Option 'UseSHA3Intrinsics' is expected to have 'false' value, but is 'UseSHA3Intrinsics ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3626380963 From bkilambi at openjdk.org Mon Dec 8 11:27:59 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 8 Dec 2025 11:27:59 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v5] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 10:39:51 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorapi/TestFloat16VectorOperations.java line 134: >> >>> 132: .intoArray(output, i); >>> 133: } >>> 134: for (; i < LEN; i++) { >> >> Will this not result in autovectorization instead and also overwrite the `output` array results from vectorapi which were previously computed? > > Yes: there could be auto-vectorization. > No: `i` is not reset, it keeps counting from where `i < SPECIES.loopBound(LEN)` fails, and handles the tail, right? > > It could be good to run this test once with and once without auto vectorization, just to make sure the vectors you see are from the Vector API, and not auto vectorization. Thanks. I missed that `i` isn't being reinitialised/reset again. Do we even need the tail loop in this case when the `LEN = 2048`? We may not even have any tail iterations? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2598233546 From dlunden at openjdk.org Mon Dec 8 12:56:03 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 8 Dec 2025 12:56:03 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 11:49:29 GMT, Roland Westrelin wrote: > The crash occurs because verification code expects the inner and outer > loop of a loop strip mining nest to have the same number of phis but, > in this case, the inner loop has one more memory phis than the outer > loop. > > 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and > outer loops have the same number of phis, as expected. > > > 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] > 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > > 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 > through the outer loop phi: > > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTe... Thanks for working on this @rwestrel! > Now, PhiNode::Identity for 94 could replace it with the bottom memory phi with same inputs 451. But it doesn't run. It last ran between 3) and 4) and there's no reason for igvn to execute it again because 4) doesn't cause 94 to change in any way. Just to double check, does `VerifyIterativeGVN` identify this missed transformation? If not, we should make sure it does. > The fix I propose is to mirror the transformation from PhiNode::Identity in PhiNode::Ideal so the end result doesn't depend on what phi is modified and processed by igvn last. Correct me if I'm wrong, but do we not achieve the same thing if we identify and add 94 to the worklist after the transformation of 93 -> 451? This possibly seems like a cleaner solution to me (see my code comment below). src/hotspot/share/opto/cfgnode.cpp line 2702: > 2700: } > 2701: } > 2702: It seems a bit out of place to transform other Phi nodes in the Ideal call for this Phi node. Can't we just instead readd all matching other Phis that we can transform to the worklist, and let IGVN handle it in a subsequent iteration? ------------- PR Review: https://git.openjdk.org/jdk/pull/28677#pullrequestreview-3551933280 PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2598521200 From luhenry at openjdk.org Mon Dec 8 13:13:00 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 8 Dec 2025 13:13:00 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> Message-ID: On Mon, 1 Dec 2025 15:13:13 GMT, Hamlin Li wrote: >> Hi, >> >> This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. >> >> This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. >> >> Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> Column names meanings: >> * p: with patch >> * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> * m: without patch >> * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> >> #### Average improvement >> >> NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. >> >> For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) >> -- | -- | -- | -- >> 1.022782609 | 2.198717391 | 2.162673913 | 2.199 >> >> > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - remove log_warning > - add test cases: BoolTest::ge/gt in enc_cmove_fp_cmp_fp Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28309#pullrequestreview-3552013870 From qxing at openjdk.org Mon Dec 8 13:20:20 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Mon, 8 Dec 2025 13:20:20 GMT Subject: Integrated: 8360192: C2: Make the type of count leading/trailing zero nodes more precise In-Reply-To: References: Message-ID: On Mon, 23 Jun 2025 03:31:36 GMT, Qizheng Xing wrote: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test This pull request has now been integrated. Changeset: b83bf071 Author: Qizheng Xing Committer: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/b83bf0717eb8926efcf85a32be08f33a41bb48dd Stats: 801 lines in 4 files changed: 735 ins; 54 del; 12 mod 8360192: C2: Make the type of count leading/trailing zero nodes more precise Reviewed-by: qamai, epeter, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/25928 From epeter at openjdk.org Mon Dec 8 13:42:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Dec 2025 13:42:14 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> Message-ID: On Mon, 1 Dec 2025 15:13:13 GMT, Hamlin Li wrote: >> Hi, >> >> This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. >> >> This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. >> >> Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> Column names meanings: >> * p: with patch >> * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> * m: without patch >> * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> >> #### Average improvement >> >> NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. >> >> For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) >> -- | -- | -- | -- >> 1.022782609 | 2.198717391 | 2.162673913 | 2.199 >> >> > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - remove log_warning > - add test cases: BoolTest::ge/gt in enc_cmove_fp_cmp_fp Nice work! I did not review the RISC-V specific changes, but had a look at the tests. Wow, there are a lot of them, and that's a good thing :) I have a few comments below, to consider for improvement. test/hotspot/jtreg/compiler/c2/irTests/TestConditionalMove.java line 39: > 37: * @summary Auto-vectorization enhancement to support vector conditional move. > 38: * @library /test/lib / > 39: * @run driver compiler.c2.irTests.TestConditionalMove Suggestion: * @run driver ${test.main.class} Might as well do that now. Avoids wrong copy of class name, which can lead to wrong test being run. test/hotspot/jtreg/compiler/c2/irTests/TestConditionalMove.java line 1564: > 1562: // @IR(counts = {IRNode.CMOVE_I, ">0", IRNode.CMP_L, ">0"}, > 1563: // applyIf = {"UseVectorCmov", "false"}, > 1564: // applyIfPlatform = {"riscv64", "true"}) Why are these all commented out? test/hotspot/jtreg/compiler/c2/irTests/TestScalarConditionalMoveCmpObj.java line 34: > 32: * @test > 33: * @summary Test conditional move + compare object. > 34: * @requires vm.simpleArch == "riscv64" It would be really nice if we generally have no platform requirements at the top of the file, but just IR test applyIf instead. That way, everyone benefits from everyone else's tests and we have less test duplication down the line. test/hotspot/jtreg/compiler/c2/irTests/TestScalarConditionalMoveCmpObj.java line 356: > 354: }; > 355: } > 356: } You could use `Generators.java`, it creates "interesting" distributions, and makes sure we use sufficient special case values. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28309#pullrequestreview-3552071451 PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2598623788 PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2598632136 PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2598660768 PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2598654446 From mli at openjdk.org Mon Dec 8 13:42:15 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 8 Dec 2025 13:42:15 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> Message-ID: On Mon, 8 Dec 2025 13:10:28 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - remove log_warning >> - add test cases: BoolTest::ge/gt in enc_cmove_fp_cmp_fp > > Marked as reviewed by luhenry (Committer). @luhenry Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28309#issuecomment-3626985436 From epeter at openjdk.org Mon Dec 8 13:42:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Dec 2025 13:42:16 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> Message-ID: On Mon, 8 Dec 2025 13:36:24 GMT, Hamlin Li wrote: >> Marked as reviewed by luhenry (Committer). > > @luhenry Thank you! @Hamlin-Li Oh bummer, I was just a few seconds too slow ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28309#issuecomment-3626995177 From mli at openjdk.org Mon Dec 8 13:42:17 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 8 Dec 2025 13:42:17 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> Message-ID: On Mon, 8 Dec 2025 13:36:38 GMT, Emanuel Peter wrote: > Nice work! I did not review the RISC-V specific changes, but had a look at the tests. Wow, there are a lot of them, and that's a good thing :) > > I have a few comments below, to consider for improvement. @eme64 Thanks for having a look. Seems there is a race condition here. :) I just triggered the integration. I'll file new bug to address your comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28309#issuecomment-3626997007 From epeter at openjdk.org Mon Dec 8 13:42:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Dec 2025 13:42:19 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> Message-ID: On Mon, 8 Dec 2025 13:21:12 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - remove log_warning >> - add test cases: BoolTest::ge/gt in enc_cmove_fp_cmp_fp > > test/hotspot/jtreg/compiler/c2/irTests/TestConditionalMove.java line 39: > >> 37: * @summary Auto-vectorization enhancement to support vector conditional move. >> 38: * @library /test/lib / >> 39: * @run driver compiler.c2.irTests.TestConditionalMove > > Suggestion: > > * @run driver ${test.main.class} > > Might as well do that now. Avoids wrong copy of class name, which can lead to wrong test being run. Also: if you are already renaming these tests, you might move them to a better directory as well. We want to avoid the `irTests` directory in the future, and sort by topic instead. Idea: put it under a `compiler/c2/cmove` directory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2598664764 From mli at openjdk.org Mon Dec 8 13:42:20 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 8 Dec 2025 13:42:20 GMT Subject: Integrated: 8357551: RISC-V: support CMoveF/D vectorization In-Reply-To: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> Message-ID: On Thu, 13 Nov 2025 21:34:30 GMT, Hamlin Li wrote: > Hi, > > This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. > > This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. > > Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. > > # Test > ## Jtreg > > in progress... > > ## Performance > > Column names meanings: > * p: with patch > * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > * m: without patch > * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > > #### Average improvement > > NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. > > For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) > -- | -- | -- | -- > 1.022782609 | 2.198717391 | 2.162673913 | 2.199 > > This pull request has now been integrated. Changeset: 6700baa5 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/6700baa5052046f53eb1b04ed3205bbd8e9e9070 Stats: 10199 lines in 15 files changed: 6731 ins; 3270 del; 198 mod 8357551: RISC-V: support CMoveF/D vectorization Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/28309 From epeter at openjdk.org Mon Dec 8 13:45:10 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Dec 2025 13:45:10 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> Message-ID: On Mon, 8 Dec 2025 13:39:23 GMT, Hamlin Li wrote: >> Nice work! I did not review the RISC-V specific changes, but had a look at the tests. Wow, there are a lot of them, and that's a good thing :) >> >> I have a few comments below, to consider for improvement. > >> Nice work! I did not review the RISC-V specific changes, but had a look at the tests. Wow, there are a lot of them, and that's a good thing :) >> >> I have a few comments below, to consider for improvement. > > @eme64 Thanks for having a look. > Seems there is a race condition here. :) I just triggered the integration. > I'll file new bug to address your comment. @Hamlin-Li Sounds good. I don't blame you, you had 2 reviewers ;) I also did not run internal testing. Most likely everything will be fine. Thanks for addressing my comments in the future, much appreciated! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28309#issuecomment-3627007431 From bkilambi at openjdk.org Mon Dec 8 14:14:17 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 8 Dec 2025 14:14:17 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v5] In-Reply-To: References: Message-ID: <9l2GpacNaxOlYFPSJe-nRiAm1cPWcxYxg65R0o0ElgE=.59208f2f-df36-4eb3-93ac-62ada0eca4d6@github.com> On Wed, 26 Nov 2025 11:34:11 GMT, Jatin Bhateja wrote: >> Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. >> - Add necessary inline expander support. >> - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. >> - Use existing Float16 vector IR and backend support. >> - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. >> >> The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). >> >> The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. >> >> image >> >> Initial RFP[1] was floated on the panama-dev mailing list. >> >> Kindly review the draft PR and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Cleanups test/hotspot/jtreg/compiler/vectorapi/TestFloat16VectorOperations.java line 82: > 80: output = new short[LEN]; > 81: > 82: short min_value = float16ToRawShortBits(Float16.MIN_VALUE); `min_value` and `max_value` not being used anywhere? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2598793109 From epeter at openjdk.org Mon Dec 8 14:36:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Dec 2025 14:36:13 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE [v2] In-Reply-To: References: Message-ID: <-ChFuqTx0aC6F7o3D6j26EUnDGIH6kos_oue-4qrPfs=.9e940c25-d0cf-4156-8b82-3569651371e7@github.com> On Fri, 5 Dec 2025 09:37:22 GMT, Xiaohong Gong wrote: >> **Problem:** >> >> This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: >> >> >> Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> const TypeVect* vt = vect_type(); >> if (Matcher::vector_needs_partial_operations(this, vt)) { >> return VectorNode::try_to_gen_masked_vector(phase, this, vt); >> } >> return LoadNode::Ideal(phase, can_reshape); >> } >> >> >> The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. >> >> This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Solution:** >> >> This patch addresses the issue through two changes: >> >> 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. >> 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Testing:** >> >> - Verified on different SVE platforms with different vector sizes (128|256|512 bits). >> - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). >> - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Combine the condition check and IR transformation to a method Nice, thanks for the updates. I think this looks cleaner now. I have one more nit below. But I'll run some sanity testing from my side already. test/hotspot/jtreg/compiler/vectorapi/TestVectorLoadStoreOptimization.java line 38: > 36: * @modules jdk.incubator.vector > 37: * > 38: * @run driver compiler.vectorapi.TestVectorLoadStoreOptimization Suggestion: * @run driver ${test.main.class} This will avoid copying the wrong name. Possible since a recent JTREG version. test/hotspot/jtreg/compiler/vectorapi/TestVectorOperationsWithPartialSize.java line 38: > 36: * @modules jdk.incubator.vector > 37: * > 38: * @run driver compiler.vectorapi.TestVectorOperationsWithPartialSize Suggestion: * @run driver ${test.main.class} ------------- PR Review: https://git.openjdk.org/jdk/pull/28651#pullrequestreview-3552378082 PR Review Comment: https://git.openjdk.org/jdk/pull/28651#discussion_r2598863339 PR Review Comment: https://git.openjdk.org/jdk/pull/28651#discussion_r2598870788 From roland at openjdk.org Mon Dec 8 14:52:35 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 8 Dec 2025 14:52:35 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: <2uqd_nRO0UZWonQnFDqkWYvrYwTGQbDEDnWx3C4eoAo=.65472aeb-e9c2-4f99-8728-d4c7e1afaf57@github.com> References: <2uqd_nRO0UZWonQnFDqkWYvrYwTGQbDEDnWx3C4eoAo=.65472aeb-e9c2-4f99-8728-d4c7e1afaf57@github.com> Message-ID: On Mon, 14 Apr 2025 11:50:27 GMT, Quan Anh Mai wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > If a `CastII` that does not narrow its input has its type being a constant, do you think GVN should transform it into a constant, or such nodes should return the bottom type so that it is not folded into a floating `ConNode`? @merykitty @eme64 @chhagedorn thanks for the reviews Does testing need to be run on this before I integrate? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3627292197 From epeter at openjdk.org Mon Dec 8 15:34:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Dec 2025 15:34:03 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v3] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Wed, 12 Nov 2025 12:39:52 GMT, Fei Gao wrote: >> Wait, you are doing some kind of special warmup above. Why? Do you maybe NOT want the methods to inline? Any other reason for the warmup? > > If I understand correctly, when `ITERATION_COUNT` is set to a fixed value, all loop optimizations will know the loop iteration count from profiling. Without a special warm-up phase, the main loop is unlikely to be auto-vectorized for these small iteration counts, because [policy_unroll()](https://github.com/openjdk/jdk/blob/400a83da893f5fc285a175b63a266de21e93683c/src/hotspot/share/opto/loopTransform.cpp#L960) in C2 always attempts to generate code that is optimal for the current trip count based on profiling information. It may decide not to auto-vectorize, or even remove the loop entirely and keep only some scalar nodes. As a result, we can?t observe the potential effects of this patch. > > The special warm-up phase would instead trigger auto-vectorization and full unrolling. I suppose this patch takes effect in scenarios where certain Java loops have already been compiled with auto-vectorization and unrolling, and are later used to process data with smaller array sizes. What do you think? Sorry, I dropped the ball on this one. A lot going on with JDK26 and other larger PRs. Ah I see. You are indeed doing some special warmup here. That should be better documented. I wonder also if you want to make this a parameter, so we can see the performance with and without it? At some point I need to check out your patch and see what effect it has on the benchmarks I'm presenting here: https://github.com/openjdk/jdk/pull/27315 Do you think it would really not be measurable for small sizes? If not, we would have to find other methods to make a difference for small iteration counts. > It may decide not to auto-vectorize, or even remove the loop entirely and keep only some scalar nodes. It could be worth creating some IR tests to see what exactly happens here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2599072156 From epeter at openjdk.org Mon Dec 8 15:50:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Dec 2025 15:50:55 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: <2uqd_nRO0UZWonQnFDqkWYvrYwTGQbDEDnWx3C4eoAo=.65472aeb-e9c2-4f99-8728-d4c7e1afaf57@github.com> Message-ID: On Mon, 8 Dec 2025 14:49:41 GMT, Roland Westrelin wrote: >> If a `CastII` that does not narrow its input has its type being a constant, do you think GVN should transform it into a constant, or such nodes should return the bottom type so that it is not folded into a floating `ConNode`? > > @merykitty @eme64 @chhagedorn thanks for the reviews > Does testing need to be run on this before I integrate? @rwestrel I'll run some testing now ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3627616227 From epeter at openjdk.org Mon Dec 8 16:45:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Dec 2025 16:45:42 GMT Subject: RFR: 8367028: compiler/c2/irTests/TestFloat16ScalarOperations.java failing intermittently because of constant folding Message-ID: The test uses random constants. But we forgot to exclude special values such as `zero`, otherwise the operations can be folded (idealized) and the IR tests fail. For example `x + 0.42` would not be folded, but `x + 0` would be folded to `x`. Solution: restrict the range we sample from. Used to be `[0, 1)`, now I just do `[0.1, 0.9)`. ------------- Commit messages: - rm line - fix - JDK-8367028 Changes: https://git.openjdk.org/jdk/pull/28678/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28678&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8367028 Stats: 17 lines in 1 file changed: 9 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/28678.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28678/head:pull/28678 PR: https://git.openjdk.org/jdk/pull/28678 From jvernee at openjdk.org Mon Dec 8 17:54:02 2025 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 8 Dec 2025 17:54:02 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v8] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 02:08:37 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Bugs and verify loader leak > - Try to avoid loader leak > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure > - Test from Jorn > - Copyright years > - Fix problem identified by Jorn > - Rollback getAndAdd for now > - Redundant change > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - ... and 5 more: https://git.openjdk.org/jdk/compare/513d3327...eebb8ff7 src/java.base/share/classes/java/lang/invoke/IndirectVarHandle.java line 114: > 112: // but checking the signature type of MH mostly works > 113: return MethodHandle.isReachableFrom(vform.getMethodType(0), cl) > 114: && target.isReachableFrom(cl); Right... one of the filters may also keep a class loader alive. But to check them, we'd have to eagerly instantiate all of them as well. FWIW, I don't think this is an issue we can just ignore. If a filter keeps a class loader alive, we'd still have a problem. Maybe it's possible to collect all the types involved from the filter when creating an IndirectVarHandle instead, and save those in a separate list for this check. src/java.base/share/classes/java/lang/invoke/MethodHandle.java line 983: > 981: } > 982: > 983: static boolean isBuiltinLoader(ClassLoader loader) { I think this can still be private? src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2018: > 2016: // Call MethodHandle.isReachableFrom for the used classes > 2017: return true; > 2018: } Can you make this `abstract`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2599553053 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2599496891 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2599488329 From mli at openjdk.org Mon Dec 8 18:01:22 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 8 Dec 2025 18:01:22 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> Message-ID: On Mon, 8 Dec 2025 13:34:10 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestConditionalMove.java line 39: >> >>> 37: * @summary Auto-vectorization enhancement to support vector conditional move. >>> 38: * @library /test/lib / >>> 39: * @run driver compiler.c2.irTests.TestConditionalMove >> >> Suggestion: >> >> * @run driver ${test.main.class} >> >> Might as well do that now. Avoids wrong copy of class name, which can lead to wrong test being run. > > Also: if you are already renaming these tests, you might move them to a better directory as well. We want to avoid the `irTests` directory in the future, and sort by topic instead. > > Idea: put it under a `compiler/c2/cmove` directory. addressed in https://github.com/openjdk/jdk/pull/28702. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2599583943 From mli at openjdk.org Mon Dec 8 18:01:25 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 8 Dec 2025 18:01:25 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> Message-ID: On Mon, 8 Dec 2025 13:32:54 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: >> >> - remove log_warning >> - add test cases: BoolTest::ge/gt in enc_cmove_fp_cmp_fp > > test/hotspot/jtreg/compiler/c2/irTests/TestScalarConditionalMoveCmpObj.java line 34: > >> 32: * @test >> 33: * @summary Test conditional move + compare object. >> 34: * @requires vm.simpleArch == "riscv64" > > It would be really nice if we generally have no platform requirements at the top of the file, but just IR test applyIf instead. That way, everyone benefits from everyone else's tests and we have less test duplication down the line. addressed in https://github.com/openjdk/jdk/pull/28702. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2599583136 From mli at openjdk.org Mon Dec 8 18:04:39 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 8 Dec 2025 18:04:39 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms Message-ID: Hi, Can you help to review this patch? [JDK-8357551](https://bugs.openjdk.org/browse/JDK-8357551) add support of CMoveF/D vectorization, at the same time it also adds some tests for scalar CMove on riscv. It's good to enable these tests on other platforms, like x86/aarch64 or maybe others. At the same time, this pr also move these tests under `compiler/c2/cmove`, as suggested here https://github.com/openjdk/jdk/pull/28309#discussion_r2598664764. Thanks! ## Test In progress... (I'm using github CI to run the tests.) ------------- Commit messages: - move dir - enable tests for aarch64 - enable tests for x64 - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - ... and 13 more: https://git.openjdk.org/jdk/compare/6700baa5...c901dece Changes: https://git.openjdk.org/jdk/pull/28702/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28702&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371920 Stats: 89 lines in 3 files changed: 0 ins; 2 del; 87 mod Patch: https://git.openjdk.org/jdk/pull/28702.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28702/head:pull/28702 PR: https://git.openjdk.org/jdk/pull/28702 From liach at openjdk.org Mon Dec 8 18:29:05 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 8 Dec 2025 18:29:05 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v8] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 17:49:41 GMT, Jorn Vernee wrote: >> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: >> >> - Bugs and verify loader leak >> - Try to avoid loader leak >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache >> - Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure >> - Test from Jorn >> - Copyright years >> - Fix problem identified by Jorn >> - Rollback getAndAdd for now >> - Redundant change >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache >> - ... and 5 more: https://git.openjdk.org/jdk/compare/50bcb546...eebb8ff7 > > src/java.base/share/classes/java/lang/invoke/IndirectVarHandle.java line 114: > >> 112: // but checking the signature type of MH mostly works >> 113: return MethodHandle.isReachableFrom(vform.getMethodType(0), cl) >> 114: && target.isReachableFrom(cl); > > Right... one of the filters may also keep a class loader alive. But to check them, we'd have to eagerly instantiate all of them as well. > > FWIW, I don't think this is an issue we can just ignore. If a filter keeps a class loader alive, we'd still have a problem. > > Maybe it's possible to collect all the types involved from the filter when creating an IndirectVarHandle instead, and save those in a separate list for this check. I mean a filter method handle may keep other classes alive in addition to just its types. This is not possible from just checking the types. The vform method type is sufficient, because the types from the filter method type is always in one of the indirect layers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2599683049 From liach at openjdk.org Mon Dec 8 19:10:48 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 8 Dec 2025 19:10:48 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v9] In-Reply-To: References: Message-ID: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: - Review - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache - Bugs and verify loader leak - Try to avoid loader leak - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache - Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure - Test from Jorn - Copyright years - Fix problem identified by Jorn - Rollback getAndAdd for now - ... and 7 more: https://git.openjdk.org/jdk/compare/1f095c6b...d734e8a6 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/eebb8ff7..d734e8a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=07-08 Stats: 11891 lines in 51 files changed: 8198 ins; 3438 del; 255 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From jvernee at openjdk.org Mon Dec 8 20:21:59 2025 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 8 Dec 2025 20:21:59 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v4] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 18:27:34 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > bracket styles I had a look at the document you wrote, but I think it still needs some work. I suggest maybe splitting that out into a separate PR. src/java.base/share/classes/jdk/internal/vm/annotation/constant-folding.md line 12: > 10: > 11: Constant folding means a read of a variable of a constant value can be replaced > 12: by the read constant value, during the construction of an IR graph. The I think think constant folding encompasses much more than just field loads. E.g. folding `3 + 4` into `7` is also constant folding. More abstractly, I'd say that constant folding is essentially running a computation at compile time. The JIT compiler tries to do some of the computations in the code that it is compiling at compile time, so that they don't have to be done over and over when the compiled code is ran. We can think of instance field loads as a computation that takes in an instance of an object, and returns the value of one of the fields. If the _input_ is a constant, it is that _computation_ that may be folded, and the _result_ of that computation is then also a constant. To do that fold, the JIT essentially has to determine if the computation will always return the same result when evaluated. Another important condition for folding field loads is that the input to that computation is always the same: namely the instance from which the field is loaded. Even if a field is a trusted final, if the instance from which that field is loaded may vary, the JIT will not treat the value of that field as 'constant'. I don't think it's necessarily wrong to say that a field 'is constant', but that doesn't guarantee that the JIT is able to constant fold loads from that field. I think the word 'constant' is a bit too vague on its own, and used to mean several different things. I detect some tension when reading the rest of this doc, where you say for instance 'may be constant', rather than the more decisive 'is constant'. For instance, a 'constant' is just a fixed value (such as '3'), but a 'constant field' is a field that can not be changed, and a load from a 'constant field' is not guaranteed to produce a (JIT) compile time 'constant'. I think you need to clearly define 'constant' earlier in the document, and potentially use different terms for these examples. ------------- PR Review: https://git.openjdk.org/jdk/pull/28540#pullrequestreview-3553830433 PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2599952662 From bulasevich at openjdk.org Mon Dec 8 20:26:57 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 8 Dec 2025 20:26:57 GMT Subject: RFR: 8280283: Dead compiler code found during the JDK-8272058 code review [v2] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 10:08:33 GMT, Anton Seoane Ampudia wrote: >> This PR removes some dead code that was found during review for [JDK-8272058](https://bugs.openjdk.org/browse/JDK-8272058). >> >> `target_addr_for_insn_or_null` is never run with a `ldrw` to `zr` (i.e. a safepoint poll). This is just a remnant from global safepointing, before we moved to using thread-local handshakes. No safepoint polling code reaches this function. More information can be read in the [original code review](https://github.com/openjdk/jdk18/pull/51#discussion_r774922087). Additionally, I have run tiers 1-6 to make sure this path did not exercise. >> >> This changeset also cleans up the unused `is_nop` function, following the comments in the issue. Other dead code mentioned there has since been long disappered. >> >> **Testing:** passes tiers 1-4 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Delete more unused code Marked as reviewed by bulasevich (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28473#pullrequestreview-3553913872 From liach at openjdk.org Mon Dec 8 20:51:00 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 8 Dec 2025 20:51:00 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v4] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 20:00:20 GMT, Jorn Vernee wrote: >> Chen Liang has updated the pull request incrementally with one additional commit since the last revision: >> >> bracket styles > > src/java.base/share/classes/jdk/internal/vm/annotation/constant-folding.md line 12: > >> 10: >> 11: Constant folding means a read of a variable of a constant value can be replaced >> 12: by the read constant value, during the construction of an IR graph. The > > I think think constant folding encompasses much more than just field loads. E.g. folding `3 + 4` into `7` is also constant folding. More abstractly, I'd say that constant folding is essentially running a computation at compile time. The JIT compiler tries to do some of the computations in the code that it is compiling at compile time, so that they don't have to be done over and over when the compiled code is ran. > > We can think of instance field loads as a computation that takes in an instance of an object, and returns the value of one of the fields. If the _input_ is a constant, it is that _computation_ that may be folded, and the _result_ of that computation is then also a constant. To do that fold, the JIT essentially has to determine if the computation will always return the same result when evaluated. > > Another important condition for folding field loads is that the input to that computation is always the same: namely the instance from which the field is loaded. Even if a field is a trusted final, if the instance from which that field is loaded may vary, the JIT will not treat the value of that field as 'constant'. I don't think it's necessarily wrong to say that a field 'is constant', but that doesn't guarantee that the JIT is able to constant fold loads from that field. > > I think the word 'constant' is a bit too vague on its own, and used to mean several different things. I detect some tension when reading the rest of this doc, where you say for instance 'may be constant', rather than the more decisive 'is constant'. For instance, a 'constant' is just a fixed value (such as '3'), but a 'constant field' is a field that can not be changed, and a load from a 'constant field' is not guaranteed to produce a (JIT) compile time 'constant'. I think you need to clearly define 'constant' earlier in the document, and potentially use different terms for these examples. Sure, I have created https://bugs.openjdk.org/browse/JDK-8373286 to track that effort instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2600090805 From liach at openjdk.org Mon Dec 8 21:14:24 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 8 Dec 2025 21:14:24 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v5] In-Reply-To: References: Message-ID: > Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. > > They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. > > We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. > > Paging @minborg who requested Optional folding for review. > > I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Jorn review - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting - bracket styles - Doc tweaks - Essay - Spurious change - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting - Issue number and test update - Fixed optional and unit test - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting - ... and 1 more: https://git.openjdk.org/jdk/compare/383203c0...b20b7f5b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28540/files - new: https://git.openjdk.org/jdk/pull/28540/files/d353bdbe..b20b7f5b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=03-04 Stats: 38514 lines in 729 files changed: 24382 ins; 11087 del; 3045 mod Patch: https://git.openjdk.org/jdk/pull/28540.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28540/head:pull/28540 PR: https://git.openjdk.org/jdk/pull/28540 From missa at openjdk.org Mon Dec 8 21:47:16 2025 From: missa at openjdk.org (Mohamed Issa) Date: Mon, 8 Dec 2025 21:47:16 GMT Subject: RFR: 8368977: Provide clear naming for AVX10 identifiers [v2] In-Reply-To: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> References: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> Message-ID: > This is a simple change that renames all AVX10 identifiers to explicitly show which sub-versions are in use. Right now, AVX10.2 is the only case to worry about. The JTREG tests listed below were used to verify correctness with the recommended JVM options mentioned in corresponding source files. Each test included runs through emulation with AVX10.2 enabled and disabled to exercise all possible paths. All modifications and tests used [OpenJDK v26-b24](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B24) as the baseline build. > > 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` > 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` > 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` > 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` > 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` > 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` > 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` > 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` > 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java` > 10. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java` > 11. `jtreg:test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java` > 12. `jtreg:test/jdk/jdk/incubator/vector/Double64VectorTests.java` > 13. `jtreg:test/jdk/jdk/incubator/vector/Double128VectorTests.java` > 14. `jtreg:test/jdk/jdk/incubator/vector/Double256VectorTests.java` > 15. `jtreg:test/jdk/jdk/incubator/vector/Double512VectorTests.java` > 16. `jtreg:test/jdk/jdk/incubator/vector/DoubleMaxVectorTests.java` > 17. `jtreg:test/jdk/jdk/incubator/vector/Float64VectorTests.java` > 18. `jtreg:test/jdk/jdk/incubator/vector/Float128VectorTests.java` > 19. `jtreg:test/jdk/jdk/incubator/vector/Float256VectorTests.java` > 20. `jtreg:test/jdk/jdk/incubator/vector/Float512VectorTests.java` > 21. `jtreg:test/jdk/jdk/incubator/vector/FloatMaxVectorTests.java` > 22. `jtreg:test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` > 23. `jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java` Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: Remove changes that affect functionality ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28344/files - new: https://git.openjdk.org/jdk/pull/28344/files/15701ac5..2a029dab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28344&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28344&range=00-01 Stats: 35 lines in 1 file changed: 0 ins; 16 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/28344.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28344/head:pull/28344 PR: https://git.openjdk.org/jdk/pull/28344 From missa at openjdk.org Mon Dec 8 21:47:19 2025 From: missa at openjdk.org (Mohamed Issa) Date: Mon, 8 Dec 2025 21:47:19 GMT Subject: RFR: 8368977: Provide clear naming for AVX10 identifiers [v2] In-Reply-To: References: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> Message-ID: <17XWWvgxG5LiMeaGTY_5N31bYTxK1qEuuNTypNGa7Uw=.9961c541-89cf-4a4f-9d82-ea078a98c51b@github.com> On Mon, 8 Dec 2025 03:28:59 GMT, Jatin Bhateja wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove changes that affect functionality > > src/hotspot/cpu/x86/templateTable_x86.cpp line 1605: > >> 1603: __ jcc(Assembler::notEqual, L); >> 1604: __ call_VM_leaf(CAST_FROM_FN_PTR(address, SharedRuntime::f2i), 1); >> 1605: } > > This change should be part of a seperate PR. Sure, I'll cover this in another PR. > src/hotspot/cpu/x86/templateTable_x86.cpp line 1620: > >> 1618: __ jcc(Assembler::notEqual, L); >> 1619: __ call_VM_leaf(CAST_FROM_FN_PTR(address, SharedRuntime::f2l), 1); >> 1620: } > > Please restrict this PR to name change related changes only. I reverted changes to this file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28344#discussion_r2600243093 PR Review Comment: https://git.openjdk.org/jdk/pull/28344#discussion_r2600244394 From vlivanov at openjdk.org Tue Dec 9 00:44:58 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 9 Dec 2025 00:44:58 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v4] In-Reply-To: References: Message-ID: <7mv4j1ymja09d6u0szgAGNC-2D9CoUaXxMjxaOJp0ok=.1270e35a-a69e-4ab7-9f27-ab57d5e06346@github.com> On Wed, 3 Dec 2025 21:58:24 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into c2.instanceof > - Unify Compile::should_delay_inlining > - Test fix > - bugid > - C2: Materialize type information from instanceof checks Thanks for the reviews, Dean and Quan. @rwestrel do you want to take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28517#issuecomment-3629677246 From vlivanov at openjdk.org Tue Dec 9 01:25:40 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 9 Dec 2025 01:25:40 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v5] In-Reply-To: References: Message-ID: > Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. > > There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. > > The difference can be illustrated with the following simple cases: > > class A { void m() {} } > class B extends A { void m() {} } > > void testInstanceOf(A obj) { > if (obj instanceof B) { > obj.m(); > } > } > > InstanceOf::testInstanceOf (12 bytes) > @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call > > vs > > void testInstanceOfCast(A obj) { > if (obj instanceof B) { > B b = (B)obj; > b.m(); > } > } > > InstanceOf::testInstanceOfCast (17 bytes) > @ 13 InstanceOf$B::m (1 bytes) inline (hot) > > > Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. > > FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. > > Testing: hs-tier1 - hs-tier5 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Revert Compile::should_delay_inlining unification ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28517/files - new: https://git.openjdk.org/jdk/pull/28517/files/58a7d521..4e9f4624 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=03-04 Stats: 6 lines in 3 files changed: 1 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28517.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28517/head:pull/28517 PR: https://git.openjdk.org/jdk/pull/28517 From jvernee at openjdk.org Tue Dec 9 01:55:03 2025 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 9 Dec 2025 01:55:03 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v9] In-Reply-To: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> References: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> Message-ID: On Mon, 8 Dec 2025 19:10:48 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - Review > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Bugs and verify loader leak > - Try to avoid loader leak > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure > - Test from Jorn > - Copyright years > - Fix problem identified by Jorn > - Rollback getAndAdd for now > - ... and 7 more: https://git.openjdk.org/jdk/compare/b009bdb3...d734e8a6 Marked as reviewed by jvernee (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28585#pullrequestreview-3554899910 From wenanjian at openjdk.org Tue Dec 9 02:27:29 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Tue, 9 Dec 2025 02:27:29 GMT Subject: RFR: 8371968: RISC-V: implement AES CBC intrinsics [v4] In-Reply-To: References: Message-ID: <_rCbfaHA583dmzHYByRElp9j7Fg0i4OqBPoDYnY9gKc=.bc1aae5c-92ff-4e51-81f1-4b0b52b36dc3@github.com> > Support AES CBC intrinsic on RISCV, Already passed the tests in > test/hotspot/jtreg/compiler/codegen/aes/ > test/jdk/com/sun/crypto Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: simplify the code structure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28320/files - new: https://git.openjdk.org/jdk/pull/28320/files/0c415020..2500663a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28320&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28320&range=02-03 Stats: 188 lines in 1 file changed: 64 ins; 110 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/28320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28320/head:pull/28320 PR: https://git.openjdk.org/jdk/pull/28320 From wenanjian at openjdk.org Tue Dec 9 02:27:34 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Tue, 9 Dec 2025 02:27:34 GMT Subject: RFR: 8371968: RISC-V: implement AES CBC intrinsics [v3] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 01:17:35 GMT, Fei Yang wrote: >> Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Modify some assert >> - RISC-V: implement AES CBC intrinsics > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2786: > >> 2784: __ mv(x10, input_len); >> 2785: __ leave(); >> 2786: __ ret(); > > Since the three cases here duplicate most of the code, seems better to introduce subroutine to simply the code. I think that will be similar as what you do for the CTR AES intrinsic where we have this subroutine `counterMode_AESCrypt` which is called by `generate_counterMode_AESCrypt`. Thanks, that's a good idea, done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28320#discussion_r2600830270 From wenanjian at openjdk.org Tue Dec 9 02:59:15 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Tue, 9 Dec 2025 02:59:15 GMT Subject: RFR: 8371968: RISC-V: implement AES CBC intrinsics [v5] In-Reply-To: References: Message-ID: > Support AES CBC intrinsic on RISCV, Already passed the tests in > test/hotspot/jtreg/compiler/codegen/aes/ > test/jdk/com/sun/crypto Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: modify code format and register name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28320/files - new: https://git.openjdk.org/jdk/pull/28320/files/2500663a..6828d5e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28320&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28320&range=03-04 Stats: 20 lines in 1 file changed: 0 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/28320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28320/head:pull/28320 PR: https://git.openjdk.org/jdk/pull/28320 From duke at openjdk.org Tue Dec 9 04:56:57 2025 From: duke at openjdk.org (duke) Date: Tue, 9 Dec 2025 04:56:57 GMT Subject: RFR: 8372641: [s390x] Test failure TestMergeStores.java [v3] In-Reply-To: <6iaWuz5X4ol8NmIvbWoQBxmceux35b3529t1sONwCZA=.08c49f3a-87dc-4030-a5a7-1a83f4209fe0@github.com> References: <6iaWuz5X4ol8NmIvbWoQBxmceux35b3529t1sONwCZA=.08c49f3a-87dc-4030-a5a7-1a83f4209fe0@github.com> Message-ID: On Thu, 27 Nov 2025 08:59:09 GMT, Harshit470250 wrote: >> [JDK-8347405](https://bugs.openjdk.org/browse/JDK-8347405) introduced a mergeStores optimisation which requires ReverseBytesS opcode and as it was not implemented for s390 the test case is failing. >> I also implemented ReverseBytesUS. > > Harshit470250 has updated the pull request incrementally with one additional commit since the last revision: > > Added whitespace @Harshit470250 Your change (at version d5ad5e4a058af079d123e22b980c366339caa887) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28523#issuecomment-3630303629 From duke at openjdk.org Tue Dec 9 05:03:12 2025 From: duke at openjdk.org (Harshit470250) Date: Tue, 9 Dec 2025 05:03:12 GMT Subject: Integrated: 8372641: [s390x] Test failure TestMergeStores.java In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 07:09:01 GMT, Harshit470250 wrote: > [JDK-8347405](https://bugs.openjdk.org/browse/JDK-8347405) introduced a mergeStores optimisation which requires ReverseBytesS opcode and as it was not implemented for s390 the test case is failing. > I also implemented ReverseBytesUS. This pull request has now been integrated. Changeset: 35fe0b11 Author: Harshit470250 <133243171+Harshit470250 at users.noreply.github.com> Committer: Amit Kumar URL: https://git.openjdk.org/jdk/commit/35fe0b11015bd3a88ee21c76b54f9d4969fdedf6 Stats: 34 lines in 1 file changed: 34 ins; 0 del; 0 mod 8372641: [s390x] Test failure TestMergeStores.java Reviewed-by: mhaessig, amitkumar, lucy ------------- PR: https://git.openjdk.org/jdk/pull/28523 From xgong at openjdk.org Tue Dec 9 05:52:24 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 9 Dec 2025 05:52:24 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE [v3] In-Reply-To: References: Message-ID: <0O2u9ERK_6UmEgdjZ9EbegeY92wNOx8mXd1z15yz2wE=.4abf7e5a-46f5-460a-ac76-fd8f063dfeec@github.com> > **Problem:** > > This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: > > > Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { > const TypeVect* vt = vect_type(); > if (Matcher::vector_needs_partial_operations(this, vt)) { > return VectorNode::try_to_gen_masked_vector(phase, this, vt); > } > return LoadNode::Ideal(phase, can_reshape); > } > > > The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. > > This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Solution:** > > This patch addresses the issue through two changes: > > 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. > 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Testing:** > > - Verified on different SVE platforms with different vector sizes (128|256|512 bits). > - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). > - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Update jtreg tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28651/files - new: https://git.openjdk.org/jdk/pull/28651/files/6206e8c0..94017b66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28651&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28651&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28651.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28651/head:pull/28651 PR: https://git.openjdk.org/jdk/pull/28651 From dbriemann at openjdk.org Tue Dec 9 07:46:04 2025 From: dbriemann at openjdk.org (David Briemann) Date: Tue, 9 Dec 2025 07:46:04 GMT Subject: RFR: 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 12:59:03 GMT, David Briemann wrote: > Aligning upwards instead of downwards not only solves the crash in large huge page scenarios but also ensures that the cache sizes are at least as big as they were set. Hi @chhagedorn since you had a look at the ticket, would you be willing to review this PR? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28658#issuecomment-3630819542 From chagedorn at openjdk.org Tue Dec 9 07:51:57 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 9 Dec 2025 07:51:57 GMT Subject: RFR: 8367028: compiler/c2/irTests/TestFloat16ScalarOperations.java failing intermittently because of constant folding In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 14:43:57 GMT, Emanuel Peter wrote: > The test uses random constants. But we forgot to exclude special values such as `zero`, otherwise the operations can be folded (idealized) and the IR tests fail. For example `x + 0.42` would not be folded, but `x + 0` would be folded to `x`. > > Solution: restrict the range we sample from. Used to be `[0, 1)`, now I just do `[0.1, 0.9)`. Looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28678#pullrequestreview-3555956837 From epeter at openjdk.org Tue Dec 9 07:57:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Dec 2025 07:57:06 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <2xxjKX6hMeKDfS9SGBEvll8yadDthCoUjCIRpaE8ObA=.b567ec00-7dad-4b57-82a4-db1149fc8942@github.com> References: <2xxjKX6hMeKDfS9SGBEvll8yadDthCoUjCIRpaE8ObA=.b567ec00-7dad-4b57-82a4-db1149fc8942@github.com> Message-ID: On Tue, 2 Dec 2025 13:52:04 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'master' into JDK-8354282 >> - whitespace >> - review >> - review >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java >> >> Co-authored-by: Christian Hagedorn >> - review >> - review >> - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 > > Thanks for the update, it looks good to me! If @eme64 also agrees with the latest patch, we can submit some testing and then hopefully get it in right before the fork. @chhagedorn I see that an internal IR test is failing - one that you added a while back. Could you have a look what may have gone wrong? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3630859059 From epeter at openjdk.org Tue Dec 9 08:05:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Dec 2025 08:05:08 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE [v3] In-Reply-To: <0O2u9ERK_6UmEgdjZ9EbegeY92wNOx8mXd1z15yz2wE=.4abf7e5a-46f5-460a-ac76-fd8f063dfeec@github.com> References: <0O2u9ERK_6UmEgdjZ9EbegeY92wNOx8mXd1z15yz2wE=.4abf7e5a-46f5-460a-ac76-fd8f063dfeec@github.com> Message-ID: <2o-6bOb3pE5YYGzPAS1wOe4LYSHdJkjZdBHURqwbg60=.74da1818-2bc0-404f-83a3-36bfe31b1cbe@github.com> On Tue, 9 Dec 2025 05:52:24 GMT, Xiaohong Gong wrote: >> **Problem:** >> >> This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: >> >> >> Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> const TypeVect* vt = vect_type(); >> if (Matcher::vector_needs_partial_operations(this, vt)) { >> return VectorNode::try_to_gen_masked_vector(phase, this, vt); >> } >> return LoadNode::Ideal(phase, can_reshape); >> } >> >> >> The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. >> >> This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Solution:** >> >> This patch addresses the issue through two changes: >> >> 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. >> 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Testing:** >> >> - Verified on different SVE platforms with different vector sizes (128|256|512 bits). >> - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). >> - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Update jtreg tests Tests pass. Thanks for the updates. And thanks for fixing it so swiftly and for the attribution ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28651#pullrequestreview-3555997078 From jbhateja at openjdk.org Tue Dec 9 08:30:14 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 9 Dec 2025 08:30:14 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v6] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - Optimizing tail handling - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Cleanups - Fix failing jtreg test in CI - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Cleanups - Adding support for custom basic type T_FLOAT16, passing BasicType lane types to inline expander entries - Cleaning up interface as per review suggestions - Some cleanups - Fix some JTREG failures - ... and 9 more: https://git.openjdk.org/jdk/compare/5f083aba...e830d855 ------------- Changes: https://git.openjdk.org/jdk/pull/28002/files Webrev: Webrev is not available because diff is too large Stats: 509573 lines in 231 files changed: 281304 ins; 226541 del; 1728 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From jbhateja at openjdk.org Tue Dec 9 08:30:15 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 9 Dec 2025 08:30:15 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v5] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 11:25:22 GMT, Bhavana Kilambi wrote: >> Yes: there could be auto-vectorization. >> No: `i` is not reset, it keeps counting from where `i < SPECIES.loopBound(LEN)` fails, and handles the tail, right? >> >> It could be good to run this test once with and once without auto vectorization, just to make sure the vectors you see are from the Vector API, and not auto vectorization. > > Thanks. I missed that `i` isn't being reinitialised/reset again. Do we even need the tail loop in this case when the `LEN = 2048`? We may not even have any tail iterations? @Bhavana-Kilambi vectorDim is a parameterizable parameter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2601582561 From xgong at openjdk.org Tue Dec 9 08:41:56 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 9 Dec 2025 08:41:56 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE [v3] In-Reply-To: <2o-6bOb3pE5YYGzPAS1wOe4LYSHdJkjZdBHURqwbg60=.74da1818-2bc0-404f-83a3-36bfe31b1cbe@github.com> References: <0O2u9ERK_6UmEgdjZ9EbegeY92wNOx8mXd1z15yz2wE=.4abf7e5a-46f5-460a-ac76-fd8f063dfeec@github.com> <2o-6bOb3pE5YYGzPAS1wOe4LYSHdJkjZdBHURqwbg60=.74da1818-2bc0-404f-83a3-36bfe31b1cbe@github.com> Message-ID: On Tue, 9 Dec 2025 08:02:23 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Update jtreg tests > > Tests pass. Thanks for the updates. > > And thanks for fixing it so swiftly and for the attribution ? Thanks so much for your review @eme64 , @shqking and @erifan ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28651#issuecomment-3631028618 From roland at openjdk.org Tue Dec 9 09:12:06 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Dec 2025 09:12:06 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 12:53:31 GMT, Daniel Lund?n wrote: > Thanks for working on this @rwestrel! > > > Now, PhiNode::Identity for 94 could replace it with the bottom memory phi with same inputs 451. But it doesn't run. It last ran between 3) and 4) and there's no reason for igvn to execute it again because 4) doesn't cause 94 to change in any way. > > Just to double check, does `VerifyIterativeGVN` identify this missed transformation? If not, we should make sure it does. It doesn't because `PhaseIterGVN::verify_Identity_for()` skips `Phi` nodes. All issues related to `Phi` nodes in `VerifyIterativeGVN` would need to be fixed first. > > The fix I propose is to mirror the transformation from PhiNode::Identity in PhiNode::Ideal so the end result doesn't depend on what phi is modified and processed by igvn last. > > Correct me if I'm wrong, but do we not achieve the same thing if we identify and add 94 to the worklist after the transformation of 93 -> 451? This possibly seems like a cleaner solution to me (see my code comment below). In principle, yes. The question is how do you reliably get 94 on the igvn queue. In this particular case, `PhiNode::Ideal()` creates 451 and enqueues it on the igvn queue with `register_new_node_with_optimizer()`. Do we want to add custom logic in `PhiNode::Ideal()` to also enqueue all memory `Phi`s that are uses of the region? It's likely not sufficient in the general case as, maybe, the transformation can only happen once one input of the bottom `Phi` is changed. So do we need something like `PhaseIterGVN::add_users_of_use_to_worklist()` as well? It wouldn't quite the same as we wouldn't enqueue uses of a use but the uses of a common input (the region)? Or rather than having logic in a couple different places to enqueue the non bottom memory `Phi`, maybe, we can do that in `PhiNode::Ideal` for the bottom `Phi` which would essentially bethe patch I propose but, instead of making any change to the graph, it would enqueue the non bottom `Phi` so `PhiNode::Identity` does the change. It seems a bit wasteful to delay the change to the graph when it can be done safely in `PhiNode::Ideal` for the bottom memory `Phi` which is why I went with the change I propose. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28677#issuecomment-3631139093 From mli at openjdk.org Tue Dec 9 09:25:35 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 9 Dec 2025 09:25:35 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > > [JDK-8357551](https://bugs.openjdk.org/browse/JDK-8357551) add support of CMoveF/D vectorization, at the same time it also adds some tests for scalar CMove on riscv. > It's good to enable these tests on other platforms, like x86/aarch64 or maybe others. > > At the same time, this pr also move these tests under `compiler/c2/cmove`, as suggested here https://github.com/openjdk/jdk/pull/28309#discussion_r2598664764. > > Thanks! > > ## Test > In progress... (I'm using github CI to run the tests.) Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: riscv + aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28702/files - new: https://git.openjdk.org/jdk/pull/28702/files/c901dece..7848a1bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28702&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28702&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28702.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28702/head:pull/28702 PR: https://git.openjdk.org/jdk/pull/28702 From roland at openjdk.org Tue Dec 9 09:43:03 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Dec 2025 09:43:03 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v5] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 01:25:40 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Revert Compile::should_delay_inlining unification Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28517#pullrequestreview-3556397553 From epeter at openjdk.org Tue Dec 9 10:06:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Dec 2025 10:06:25 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v5] In-Reply-To: References: Message-ID: <8uE-UIoLllpjPuICc7sjKwo2eEtbGPYcgFwDUtQ0QpM=.525a688d-2dfc-4dfb-9dd5-c8024d4bb74e@github.com> On Tue, 9 Dec 2025 01:25:40 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Revert Compile::should_delay_inlining unification test/hotspot/jtreg/compiler/inlining/TestSubtypeCheckTypeInfo.java line 363: > 361: // Sample: > 362: // 213 42 b compiler.inlining.TestSubtypeCheckTypeInfo::testIsInstanceCondLatePost (13 bytes) > 363: static final Pattern TEST_CASE = Pattern.compile("^\\d+\\s+\\d+\\s+b\\s+" + TEST_CLASS_NAME + "::(\\w+) .*"); Drive by comment, no need to change things here now: @iwanowww @chhagedorn Would it not be nice if we could do this kind of matching with the `TestFramework`? Instead of `IR` matching, just match the output of any compilation tracing / printing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2601929785 From dlunden at openjdk.org Tue Dec 9 10:32:28 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 9 Dec 2025 10:32:28 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 09:08:53 GMT, Roland Westrelin wrote: > The question is how do you reliably get 94 on the igvn queue. In this particular case, PhiNode::Ideal() creates 451 and enqueues it on the igvn queue with register_new_node_with_optimizer(). Do we want to add custom logic in PhiNode::Ideal() to also enqueue all memory Phis that are uses of the region? It's likely not sufficient in the general case as, maybe, the transformation can only happen once one input of the bottom Phi is changed. So do we need something like PhaseIterGVN::add_users_of_use_to_worklist() as well? It wouldn't quite the same as we wouldn't enqueue uses of a use but the uses of a common input (the region)? I'm not sure about the exact mechanism, but it would be nice if `PhiNode::Ideal` adds all relevant nodes to the worklist after creating a new Bot memory Phi. Optimally, we'd like `VerifyIterativeGVN` to notify us of any missing cases. Sounds like two follow-up issues: (1) support `Phi` nodes in `VerifyIterativeGVN` and (2) use the information from `VerifyIterativeGVN` to fix missing cases. > Or rather than having logic in a couple different places to enqueue the non bottom memory Phi, maybe, we can do that in PhiNode::Ideal for the bottom Phi which would essentially bethe patch I propose but, instead of making any change to the graph, it would enqueue the non bottom Phi so PhiNode::Identity does the change. It seems a bit wasteful to delay the change to the graph when it can be done safely in PhiNode::Ideal for the bottom memory Phi which is why I went with the change I propose. Right, this is what I propose to fix the present issue and it seems cleaner to me (we let `Identity` handle the identity transformations). I doubt there'll be a measurable compilation time difference. I don't have a strong opinion though, so we can go with what you propose as well. Let's see what other reviewers think before we make a decision! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28677#issuecomment-3631485810 From krk at openjdk.org Tue Dec 9 10:50:44 2025 From: krk at openjdk.org (Kerem Kat) Date: Tue, 9 Dec 2025 10:50:44 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v6] In-Reply-To: References: Message-ID: > Do not try to replace `fallthrough_memproj` when it is null, fixes crash. > > Test case is simplified from the ticket. Verified that the case crashes without the fix. Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'master' into fix-c2-segfault-unlocknode - Merge branch 'master' into fix-c2-segfault-unlocknode - address comments - fix rename - rename test file - Merge branch 'master' into fix-c2-segfault-unlocknode - fix test spacing - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - copyright format fix? - ... and 1 more: https://git.openjdk.org/jdk/compare/1574ff53...b5e878c7 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28432/files - new: https://git.openjdk.org/jdk/pull/28432/files/21018290..b5e878c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=04-05 Stats: 25873 lines in 499 files changed: 17932 ins; 6014 del; 1927 mod Patch: https://git.openjdk.org/jdk/pull/28432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28432/head:pull/28432 PR: https://git.openjdk.org/jdk/pull/28432 From duke at openjdk.org Tue Dec 9 11:36:00 2025 From: duke at openjdk.org (ExE Boss) Date: Tue, 9 Dec 2025 11:36:00 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v8] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 18:26:19 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/invoke/IndirectVarHandle.java line 114: >> >>> 112: // but checking the signature type of MH mostly works >>> 113: return MethodHandle.isReachableFrom(vform.getMethodType(0), cl) >>> 114: && target.isReachableFrom(cl); >> >> Right... one of the filters may also keep a class loader alive. But to check them, we'd have to eagerly instantiate all of them as well. >> >> FWIW, I don't think this is an issue we can just ignore. If a filter keeps a class loader alive, we'd still have a problem. >> >> Maybe it's possible to collect all the types involved from the filter when creating an IndirectVarHandle instead, and save those in a separate list for this check. > > I mean a filter method handle may keep other classes alive in addition to just its types. This is not possible from just checking the types. The vform method type is sufficient, because the types from the filter method type is always in one of the indirect layers. Also, it?s?possible for?intermediate `MethodHandle`s used?as?part of?`MethodHandle` combinators to?refer to?types from?different class?loaders. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2602260725 From rsunderbabu at openjdk.org Tue Dec 9 11:38:00 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Tue, 9 Dec 2025 11:38:00 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v2] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 11:11:58 GMT, Hao Sun wrote: > e error log could you please share the entire log? I want to see what the predicate returned. It should be either `Running compiler.intrinsics.sha.cli.testcases.GenericTestCaseForUnsupportedCPU` or `Skipping compiler.intrinsics.sha.cli.testcases.GenericTestCaseForUnsupportedCPU` ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3631798337 From syan at openjdk.org Tue Dec 9 12:09:59 2025 From: syan at openjdk.org (SendaoYan) Date: Tue, 9 Dec 2025 12:09:59 GMT Subject: RFR: 8367028: compiler/c2/irTests/TestFloat16ScalarOperations.java failing intermittently because of constant folding In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 14:43:57 GMT, Emanuel Peter wrote: > The test uses random constants. But we forgot to exclude special values such as `zero`, otherwise the operations can be folded (idealized) and the IR tests fail. For example `x + 0.42` would not be folded, but `x + 0` would be folded to `x`. > > Solution: restrict the range we sample from. Used to be `[0, 1)`, now I just do `[0.1, 0.9)`. Marked as reviewed by syan (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28678#pullrequestreview-3557072306 From rcastanedalo at openjdk.org Tue Dec 9 12:39:11 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 9 Dec 2025 12:39:11 GMT Subject: RFR: 8367028: compiler/c2/irTests/TestFloat16ScalarOperations.java failing intermittently because of constant folding In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 14:43:57 GMT, Emanuel Peter wrote: > The test uses random constants. But we forgot to exclude special values such as `zero`, otherwise the operations can be folded (idealized) and the IR tests fail. For example `x + 0.42` would not be folded, but `x + 0` would be folded to `x`. > > Solution: restrict the range we sample from. Used to be `[0, 1)`, now I just do `[0.1, 0.9)`. Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28678#pullrequestreview-3557187661 From epeter at openjdk.org Tue Dec 9 12:48:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Dec 2025 12:48:12 GMT Subject: RFR: 8367028: compiler/c2/irTests/TestFloat16ScalarOperations.java failing intermittently because of constant folding In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 07:49:45 GMT, Christian Hagedorn wrote: >> The test uses random constants. But we forgot to exclude special values such as `zero`, otherwise the operations can be folded (idealized) and the IR tests fail. For example `x + 0.42` would not be folded, but `x + 0` would be folded to `x`. >> >> Solution: restrict the range we sample from. Used to be `[0, 1)`, now I just do `[0.1, 0.9)`. > > Looks good, thanks! Thanks @chhagedorn @robcasloz @sendaoYan thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28678#issuecomment-3632078974 From epeter at openjdk.org Tue Dec 9 12:48:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Dec 2025 12:48:14 GMT Subject: Integrated: 8367028: compiler/c2/irTests/TestFloat16ScalarOperations.java failing intermittently because of constant folding In-Reply-To: References: Message-ID: <3mvNCkbwJfCwkMj01RxnFUyBqCzuF4mhpclfJA0szUc=.bbed8e43-b3ca-4b82-90df-f28f5fb1f62f@github.com> On Fri, 5 Dec 2025 14:43:57 GMT, Emanuel Peter wrote: > The test uses random constants. But we forgot to exclude special values such as `zero`, otherwise the operations can be folded (idealized) and the IR tests fail. For example `x + 0.42` would not be folded, but `x + 0` would be folded to `x`. > > Solution: restrict the range we sample from. Used to be `[0, 1)`, now I just do `[0.1, 0.9)`. This pull request has now been integrated. Changeset: a4eb57c5 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/a4eb57c5ec6254e59e486042015dd00457284ef2 Stats: 17 lines in 1 file changed: 9 ins; 0 del; 8 mod 8367028: compiler/c2/irTests/TestFloat16ScalarOperations.java failing intermittently because of constant folding Reviewed-by: chagedorn, syan, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/28678 From haosun at openjdk.org Tue Dec 9 12:58:06 2025 From: haosun at openjdk.org (Hao Sun) Date: Tue, 9 Dec 2025 12:58:06 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v2] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 11:35:31 GMT, Ramkumar Sunderbabu wrote: > > e error log > > could you please share the entire log? I want to see what the predicate returned. It should be either `Running compiler.intrinsics.sha.cli.testcases.GenericTestCaseForUnsupportedCPU` or `Skipping compiler.intrinsics.sha.cli.testcases.GenericTestCaseForUnsupportedCPU` Check here https://bugs.openjdk.org/secure/attachment/117808/TestUseSHA3IntrinsicsOptionOnUnsupportedCPU-fail-8982a05.log This log is generated via `make test TEST=test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnUnsupportedCPU.java >/tmp/aaa.log` Code: commit 8982a05 (the latest commit in this pull request) Hardware: AArch64 with sha3 feature OS: ubuntu 24.04 Note that this case would pass on x86 or AArch64 without sha3 feature. Feel free to let me if more information is needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3632130816 From fjiang at openjdk.org Tue Dec 9 13:51:56 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 9 Dec 2025 13:51:56 GMT Subject: RFR: 8371968: RISC-V: implement AES CBC intrinsics [v5] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 02:59:15 GMT, Anjian Wen wrote: >> Support AES CBC intrinsic on RISCV, Already passed the tests in >> test/hotspot/jtreg/compiler/codegen/aes/ >> test/jdk/com/sun/crypto > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify code format and register name Marked as reviewed by fjiang (Committer). Looks good, thanks! ------------- PR Review: https://git.openjdk.org/jdk/pull/28320#pullrequestreview-3557568702 Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/28320#pullrequestreview-3557570140 From rcastanedalo at openjdk.org Tue Dec 9 14:08:39 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 9 Dec 2025 14:08:39 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: References: Message-ID: <_oUy5ZPqiqz05sYchjgUEtf_L4I077g3XKK0o8DoF8Q=.565b5e4c-eeea-475b-8d53-69d564b92a15@github.com> On Fri, 5 Dec 2025 11:49:29 GMT, Roland Westrelin wrote: > The crash occurs because verification code expects the inner and outer > loop of a loop strip mining nest to have the same number of phis but, > in this case, the inner loop has one more memory phis than the outer > loop. > > 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and > outer loops have the same number of phis, as expected. > > > 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] > 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > > 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 > through the outer loop phi: > > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTe... I just started some internal testing, will come back with results in a day or two, and hopefully also start reviewing this soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28677#issuecomment-3632443249 From chagedorn at openjdk.org Tue Dec 9 14:10:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 9 Dec 2025 14:10:55 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 14:05:06 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review I had a look and it seems that the internal test is relying on a `CastII` node to be removed after loop opts, when we widen `CastII` nodes, to trigger an ideal optimization. That is no longer the case with this patch because we keep the `CastII` node in the graph. The fix would be to improve the ideal optimization to look through cast nodes. However, this feels out of scope, especially since this PR is a bug fix for JDK 26. I therefore propose to fix the internal test before integrating this PR and then follow up with an RFE to fix the ideal optimization. I can take care of this and let you know once this is done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3632449257 From epeter at openjdk.org Tue Dec 9 14:16:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Dec 2025 14:16:08 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v2] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 09:25:35 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> [JDK-8357551](https://bugs.openjdk.org/browse/JDK-8357551) add support of CMoveF/D vectorization, at the same time it also adds some tests for scalar CMove on riscv. >> It's good to enable these tests on other platforms, like x86/aarch64 or maybe others. >> >> At the same time, this pr also move these tests under `compiler/c2/cmove`, as suggested here https://github.com/openjdk/jdk/pull/28309#discussion_r2598664764. >> >> Thanks! >> >> ## Test >> In progress... (I'm using github CI to run the tests.) > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > riscv + aarch64 @Hamlin-Li Thanks for publishing this so quickly, and for considering other platforms, much appreciated ? The patch looks good to me. I'll run some internal testing now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28702#issuecomment-3632474530 From liach at openjdk.org Tue Dec 9 14:42:22 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 9 Dec 2025 14:42:22 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v8] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 11:33:31 GMT, ExE Boss wrote: >> I mean a filter method handle may keep other classes alive in addition to just its types. This is not possible from just checking the types. The vform method type is sufficient, because the types from the filter method type is always in one of the indirect layers. > > Also, it?s?possible for?intermediate `MethodHandle`s used?as?part of?`MethodHandle` combinators to?refer to?types from?different class?loaders. It is, but after all, there can be similar resource leak risks from calling LambdaMetafactory or other APIs too. We expect AccessDescriptor construction to be mostly limited to specific sites. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2602958025 From epeter at openjdk.org Tue Dec 9 14:43:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Dec 2025 14:43:50 GMT Subject: [jdk26] RFR: 8367028: compiler/c2/irTests/TestFloat16ScalarOperations.java failing intermittently because of constant folding Message-ID: Clean backport of https://github.com/openjdk/jdk/pull/28678 ------------- Commit messages: - Backport a4eb57c5ec6254e59e486042015dd00457284ef2 Changes: https://git.openjdk.org/jdk/pull/28720/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28720&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8367028 Stats: 17 lines in 1 file changed: 9 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/28720.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28720/head:pull/28720 PR: https://git.openjdk.org/jdk/pull/28720 From roland at openjdk.org Tue Dec 9 15:03:07 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Dec 2025 15:03:07 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 14:07:22 GMT, Christian Hagedorn wrote: > I therefore propose to fix the internal test before integrating this PR and then follow up with an RFE to fix the ideal optimization. I can take care of this and let you know once this is done. That sounds good to me. Should I take care of the ideal transformation? Let me know when the internal test is so I can proceed with the integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3632713131 From rsunderbabu at openjdk.org Tue Dec 9 15:03:56 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Tue, 9 Dec 2025 15:03:56 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v2] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 12:55:41 GMT, Hao Sun wrote: > are: AArch64 with sha3 fea >From log `Running compiler.intrinsics.sha.cli.testcases.GenericTestCaseForUnsupportedCPU` For a host where SHA3 intrinsic is supported, the test case must be skipped. IntrinsicPredicates.isSHA3IntrinsicAvailable() should have returned true, but its returning false. I will check if there is any logs that can help us. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3632722978 From fyang at openjdk.org Tue Dec 9 15:19:13 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 9 Dec 2025 15:19:13 GMT Subject: RFR: 8371968: RISC-V: implement AES CBC intrinsics [v5] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 02:59:15 GMT, Anjian Wen wrote: >> Support AES CBC intrinsic on RISCV, Already passed the tests in >> test/hotspot/jtreg/compiler/codegen/aes/ >> test/jdk/com/sun/crypto > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify code format and register name Latest version looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28320#pullrequestreview-3558004375 From thartmann at openjdk.org Tue Dec 9 15:19:14 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 Dec 2025 15:19:14 GMT Subject: [jdk26] RFR: 8367028: compiler/c2/irTests/TestFloat16ScalarOperations.java failing intermittently because of constant folding In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 14:32:26 GMT, Emanuel Peter wrote: > Clean backport of https://github.com/openjdk/jdk/pull/28678 Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28720#pullrequestreview-3558004277 From chagedorn at openjdk.org Tue Dec 9 15:27:11 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 9 Dec 2025 15:27:11 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 14:05:06 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks Roland! I'll let you know and file a follow-up RFE and assign it to you. I will dump all the relevant information in there with a test case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3632843345 From rcastanedalo at openjdk.org Tue Dec 9 17:03:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 9 Dec 2025 17:03:54 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 11:49:29 GMT, Roland Westrelin wrote: > The crash occurs because verification code expects the inner and outer > loop of a loop strip mining nest to have the same number of phis but, > in this case, the inner loop has one more memory phis than the outer > loop. > > 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and > outer loops have the same number of phis, as expected. > > > 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] > 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > > 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 > through the outer loop phi: > > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTe... Changes requested by rcastanedalo (Reviewer). test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java line 28: > 26: * @bug 8370200 > 27: * @summary Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis > 28: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -XX:StressSeed=36200582 -XX:CompileCommand=quiet Suggestion: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:StressSeed=36200582 -XX:CompileCommand=quiet test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java line 31: > 29: * -XX:CompileCommand=compileonly,*TestMismatchedMemoryPhis*::mainTest -XX:-TieredCompilation > 30: * -Xcomp -XX:+StressIGVN -XX:+StressLoopPeeling -XX:PerMethodTrapLimit=0 TestMismatchedMemoryPhis > 31: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -XX:CompileCommand=quiet Suggestion: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=quiet ------------- PR Review: https://git.openjdk.org/jdk/pull/28677#pullrequestreview-3558546288 PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2603519906 PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2603520959 From chagedorn at openjdk.org Tue Dec 9 17:59:20 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 9 Dec 2025 17:59:20 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: <7ZPdHr7IzEoj0yh45zEt-8ogQ8-2q435PPXieqqZKJU=.4366191f-729e-4e38-84f3-628f0d83cb33@github.com> On Fri, 5 Dec 2025 14:05:06 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review The internal test is fixed and sanity testing passed - you can move forward with integrating this PR :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3633535520 From aseoane at openjdk.org Tue Dec 9 21:18:15 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Tue, 9 Dec 2025 21:18:15 GMT Subject: RFR: 8364490: Fatal error on large SpecTrapLimitExtraEntries value [v3] In-Reply-To: <2DNZFggnLVM-qT_BaYB2JsKRfsE4JlbJOaUWnbbnR9Q=.33648143-4989-412e-a3a6-74cdd5622932@github.com> References: <2DNZFggnLVM-qT_BaYB2JsKRfsE4JlbJOaUWnbbnR9Q=.33648143-4989-412e-a3a6-74cdd5622932@github.com> Message-ID: On Tue, 25 Nov 2025 17:37:27 GMT, Anton Seoane Ampudia wrote: >> This PR addresses VM crashes on very large values for `SpecTrapLimitExtraEntries`. >> >> The experimental `SpecTrapLimitExtraEntries` allows for a user-specified number of extra method data trap entries for speculation. Currently, this number is implemented with an `int`, which means that users can specify very large values that will translate into huge `MethodData` objects that cannot be allocated in Metaspace. >> >> An `int` range of values should not be allowed, as negative `SpecTrapLimitExtraEntries` do not make any sense, and very high values (such as the ones that cause this crash) are equally nonsensical. This changeset adds a range to the flag values to address these issues. >> >> `SpecTrapLimitExtraEntries` is `MAX`ed with HotSpot's computed heuristic, which means that in any case it can only serve as a buffer above the heuristic. Based on benchmarks where I logged heuristic-derived values for extra `DataLayout` cells, even a value of 50 for `SpecTrapLimitExtraEntries` is more than sufficient. To provide some headroom and keep things simple, I have set the upper limit to 100. >> >> **Testing:** passes tiers 1-2 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Pinging @rwestrel, as he introduced the flag originally. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28451#issuecomment-3633698974 From jrose at openjdk.org Wed Dec 10 00:55:53 2025 From: jrose at openjdk.org (John R Rose) Date: Wed, 10 Dec 2025 00:55:53 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v6] In-Reply-To: <1o0uydFw-zaax7mVXkqJL15Cto0okbjQtkoqs6ADUyU=.001fda73-be80-41bd-9d2f-9258889117e3@github.com> References: <1o0uydFw-zaax7mVXkqJL15Cto0okbjQtkoqs6ADUyU=.001fda73-be80-41bd-9d2f-9258889117e3@github.com> Message-ID: On Tue, 2 Dec 2025 16:04:12 GMT, Aleksey Shipilev wrote: >> We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. >> >> There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. >> >> After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. >> >> Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. >> >> Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. >> >> Additional testing: >> - [x] GHA >> - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` >> - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8360557-ctw-inlining > - Enable more testing > - Merge branch 'master' into JDK-8360557-ctw-inlining > - Merge branch 'master' into JDK-8360557-ctw-inlining > - Merge branch 'master' into JDK-8360557-ctw-inlining > - Merge branch 'master' into JDK-8360557-ctw-inlining > - Update src/hotspot/share/compiler/compiler_globals.hpp > > Co-authored-by: Tobias Hartmann > - Revert separate patch > - Final > - Proper option name and bump the limits > - ... and 1 more: https://git.openjdk.org/jdk/compare/250a9c45...2d02b713 This is a good bug-hunting tool. It?s also a promise of stronger tools, as we escalate and/or randomize the inlining heuristics. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26068#issuecomment-3634898358 From xgong at openjdk.org Wed Dec 10 02:12:44 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 10 Dec 2025 02:12:44 GMT Subject: Integrated: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 01:41:19 GMT, Xiaohong Gong wrote: > **Problem:** > > This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: > > > Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { > const TypeVect* vt = vect_type(); > if (Matcher::vector_needs_partial_operations(this, vt)) { > return VectorNode::try_to_gen_masked_vector(phase, this, vt); > } > return LoadNode::Ideal(phase, can_reshape); > } > > > The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. > > This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Solution:** > > This patch addresses the issue through two changes: > > 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. > 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Testing:** > > - Verified on different SVE platforms with different vector sizes (128|256|512 bits). > - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). > - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. This pull request has now been integrated. Changeset: b6732d60 Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/b6732d6048259de68a3dd5b4f66ac82f87270404 Stats: 638 lines in 8 files changed: 582 ins; 19 del; 37 mod 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE Co-authored-by: Emanuel Peter Reviewed-by: epeter, erfang, haosun ------------- PR: https://git.openjdk.org/jdk/pull/28651 From wenanjian at openjdk.org Wed Dec 10 02:19:28 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 10 Dec 2025 02:19:28 GMT Subject: RFR: 8371968: RISC-V: implement AES CBC intrinsics [v5] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 15:15:59 GMT, Fei Yang wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> modify code format and register name > > Latest version looks good. Thanks. @RealFYang @feilongjiang Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28320#issuecomment-3635063703 From duke at openjdk.org Wed Dec 10 02:19:29 2025 From: duke at openjdk.org (duke) Date: Wed, 10 Dec 2025 02:19:29 GMT Subject: RFR: 8371968: RISC-V: implement AES CBC intrinsics [v5] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 02:59:15 GMT, Anjian Wen wrote: >> Support AES CBC intrinsic on RISCV, Already passed the tests in >> test/hotspot/jtreg/compiler/codegen/aes/ >> test/jdk/com/sun/crypto > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify code format and register name @Anjian-Wen Your change (at version 6828d5e5b4be79414ef3e2fb84ffef1ff994f01a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28320#issuecomment-3635064781 From wenanjian at openjdk.org Wed Dec 10 02:24:04 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 10 Dec 2025 02:24:04 GMT Subject: RFR: 8373069: RISC-V: implement GHASH intrinsic Message-ID: support GHASH intrinsic for crypt GCM, which need zvkg extension. passed the tests in test/hotspot/jtreg/compiler/codegen/aes/ test/jdk/com/sun/crypto ------------- Commit messages: - Add some flag - RISC-V: implement GHASH intrinsic Changes: https://git.openjdk.org/jdk/pull/28548/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28548&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373069 Stats: 82 lines in 6 files changed: 81 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28548.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28548/head:pull/28548 PR: https://git.openjdk.org/jdk/pull/28548 From wenanjian at openjdk.org Wed Dec 10 02:38:02 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 10 Dec 2025 02:38:02 GMT Subject: Integrated: 8371968: RISC-V: implement AES CBC intrinsics In-Reply-To: References: Message-ID: <4pmWS5SED5BEbB41uzzXAI0cQbRlNq9smvLUct8Dfzs=.6e80e6a3-820b-44c9-8cfd-02e83e9d2679@github.com> On Fri, 14 Nov 2025 11:30:41 GMT, Anjian Wen wrote: > Support AES CBC intrinsic on RISCV, Already passed the tests in > test/hotspot/jtreg/compiler/codegen/aes/ > test/jdk/com/sun/crypto This pull request has now been integrated. Changeset: a5968f93 Author: Anjian Wen Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/a5968f936462741a7edea5bbbe73cb067af3d34f Stats: 182 lines in 1 file changed: 181 ins; 1 del; 0 mod 8371968: RISC-V: implement AES CBC intrinsics Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/28320 From fyang at openjdk.org Wed Dec 10 02:59:23 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 10 Dec 2025 02:59:23 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v2] In-Reply-To: References: Message-ID: <974SIxPSx8nU0vFHAVxYyyw9UiTFtYWspmvNfOD1cOQ=.e3ce2b34-0e42-42c8-8a3c-49463203f21c@github.com> On Tue, 9 Dec 2025 09:25:35 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> [JDK-8357551](https://bugs.openjdk.org/browse/JDK-8357551) add support of CMoveF/D vectorization, at the same time it also adds some tests for scalar CMove on riscv. >> It's good to enable these tests on other platforms, like x86/aarch64 or maybe others. >> >> At the same time, this pr also move these tests under `compiler/c2/cmove`, as suggested here https://github.com/openjdk/jdk/pull/28309#discussion_r2598664764. >> >> Thanks! >> >> ## Test >> In progress... (I'm using github CI to run the tests.) > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > riscv + aarch64 test/hotspot/jtreg/compiler/c2/cmove/TestFPComparison2.java line 34: > 32: * @summary The test is to trigger code path of BoolTest::ge/gt in C2_MacroAssembler::enc_cmove_cmp_fp > 33: * @requires vm.debug > 34: * @requires os.arch == "riscv64" | os.arch=="aarch64" One minor question: Could this work for `x64` as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28702#discussion_r2605010685 From xgong at openjdk.org Wed Dec 10 03:56:52 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 10 Dec 2025 03:56:52 GMT Subject: [jdk26] RFR: 8373383: C2: Missing Ideal optimizations for load and store vectors on SVE Message-ID: Hi all, This pull request contains a backport of commit [b6732d60](https://github.com/openjdk/jdk/commit/b6732d6048259de68a3dd5b4f66ac82f87270404) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Xiaohong Gong on 10 Dec 2025 and was reviewed by Emanuel Peter, Eric Fang and Hao Sun. Thanks! ------------- Commit messages: - Backport b6732d6048259de68a3dd5b4f66ac82f87270404 Changes: https://git.openjdk.org/jdk/pull/28732/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28732&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373383 Stats: 638 lines in 8 files changed: 582 ins; 19 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/28732.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28732/head:pull/28732 PR: https://git.openjdk.org/jdk/pull/28732 From chagedorn at openjdk.org Wed Dec 10 06:33:24 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 Dec 2025 06:33:24 GMT Subject: RFR: 8364490: Fatal error on large SpecTrapLimitExtraEntries value [v3] In-Reply-To: <2DNZFggnLVM-qT_BaYB2JsKRfsE4JlbJOaUWnbbnR9Q=.33648143-4989-412e-a3a6-74cdd5622932@github.com> References: <2DNZFggnLVM-qT_BaYB2JsKRfsE4JlbJOaUWnbbnR9Q=.33648143-4989-412e-a3a6-74cdd5622932@github.com> Message-ID: <1tfmt8dIK01Jrwn7rQCYsxm2drWg9qmIHuZFOPrVqzc=.877eb34e-0cc3-4a83-a4ae-92cd9c6e9f97@github.com> On Tue, 25 Nov 2025 17:37:27 GMT, Anton Seoane Ampudia wrote: >> This PR addresses VM crashes on very large values for `SpecTrapLimitExtraEntries`. >> >> The experimental `SpecTrapLimitExtraEntries` allows for a user-specified number of extra method data trap entries for speculation. Currently, this number is implemented with an `int`, which means that users can specify very large values that will translate into huge `MethodData` objects that cannot be allocated in Metaspace. >> >> An `int` range of values should not be allowed, as negative `SpecTrapLimitExtraEntries` do not make any sense, and very high values (such as the ones that cause this crash) are equally nonsensical. This changeset adds a range to the flag values to address these issues. >> >> `SpecTrapLimitExtraEntries` is `MAX`ed with HotSpot's computed heuristic, which means that in any case it can only serve as a buffer above the heuristic. Based on benchmarks where I logged heuristic-derived values for extra `DataLayout` cells, even a value of 50 for `SpecTrapLimitExtraEntries` is more than sufficient. To provide some headroom and keep things simple, I have set the upper limit to 100. >> >> **Testing:** passes tiers 1-2 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Looks good, thanks for the udpate! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28451#pullrequestreview-3560956504 From duke at openjdk.org Wed Dec 10 07:10:34 2025 From: duke at openjdk.org (duke) Date: Wed, 10 Dec 2025 07:10:34 GMT Subject: Withdrawn: 8364407: [REDO] Consolidate Identity of self-inverse operations In-Reply-To: References: Message-ID: <-ejI8TKGWdaXXzY9dTeJ0gLx2aY8pEFLS_Us636ZDHQ=.571ec609-e46e-4842-a472-902fd6983b49@github.com> On Mon, 18 Aug 2025 15:49:48 GMT, Hannes Greule wrote: > The previous approach was flawed for `short` and `char` as these are int-subtypes and truncate the result (see the backout issue https://bugs.openjdk.org/browse/JDK-8364409 for a reproducer). > > This change now first ensures that the input type is small enough so no truncation gets lost when dropping the operations. > > The previous implementation also used an `InvolutionNode` superclass with one `Identity(...)` implementation, but there were some reservations whether this is the right way to go. As we now have a `ReverseBytesNode`, there is also less benefit in having the supertype, as this covers 4 in 1 already. > > I also added test cases on top of the original ones that ensure the nodes stay when we can't prove the input type is small enough. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/26823 From mhaessig at openjdk.org Wed Dec 10 07:33:24 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 10 Dec 2025 07:33:24 GMT Subject: [jdk26] RFR: 8367028: compiler/c2/irTests/TestFloat16ScalarOperations.java failing intermittently because of constant folding In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 14:32:26 GMT, Emanuel Peter wrote: > Clean backport of https://github.com/openjdk/jdk/pull/28678 Good. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28720#pullrequestreview-3561128864 From epeter at openjdk.org Wed Dec 10 07:38:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 07:38:33 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v9] In-Reply-To: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> References: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> Message-ID: On Mon, 8 Dec 2025 19:10:48 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - Review > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Bugs and verify loader leak > - Try to avoid loader leak > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure > - Test from Jorn > - Copyright years > - Fix problem identified by Jorn > - Rollback getAndAdd for now > - ... and 7 more: https://git.openjdk.org/jdk/compare/edba0887...d734e8a6 Was asked on slack to review IR tests, so that's all I'm doing here - won't review the whole patch ;) test/hotspot/jtreg/compiler/c2/irTests/constantFold/VarHandleMismatchedTypeFold.java line 2: > 1: /* > 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. We want to eventually migrate all tests from `c2/irTests` to more "topic based" directories, so it would be great if you already migrated them now, since you probably would know a better name/place now already ;) test/hotspot/jtreg/compiler/c2/irTests/constantFold/VarHandleMismatchedTypeFold.java line 41: > 39: * @summary Verify constant folding is possible for mismatched VarHandle access > 40: * @library /test/lib / > 41: * @requires vm.compiler2.enabled What would happen if you removed this? Is the test expected to fail anywhere? IR rules are executed only if C2 is available anyway, so probably you don't need this restriction. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28585#pullrequestreview-3561137086 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2605521507 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2605526968 From epeter at openjdk.org Wed Dec 10 07:42:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 07:42:29 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v5] In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 21:14:24 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Jorn review > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - bracket styles > - Doc tweaks > - Essay > - Spurious change > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - Issue number and test update > - Fixed optional and unit test > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - ... and 1 more: https://git.openjdk.org/jdk/compare/f4fe3397...b20b7f5b Drive-by comments about IR test only ;) test/hotspot/jtreg/compiler/c2/irTests/constantFold/TestOptional.java line 2: > 1: /* > 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. We would like to move all tests away from `irTests`. It would be better to sort tests by topic, rather than by the method we test them. test/hotspot/jtreg/compiler/c2/irTests/constantFold/TestOptional.java line 39: > 37: * @summary Verify constant folding for Optional > 38: * @library /test/lib / > 39: * @requires vm.compiler2.enabled I doubt you actually need the C2 restriction here. IR tests could still run verification for results without C2. IR rules only run if C2 is available, otherwise the test can still pass, just no IR rules are run. test/hotspot/jtreg/compiler/c2/irTests/constantFold/TestOptional.java line 40: > 38: * @library /test/lib / > 39: * @requires vm.compiler2.enabled > 40: * @run driver compiler.c2.irTests.constantFold.TestOptional Suggestion: * @run driver ${test.main.class} Since this is now possible, we should use this templated approach now, to avoid invoking the wrong test classes by accident. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28540#pullrequestreview-3561154527 PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2605537189 PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2605543019 PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2605539633 From chagedorn at openjdk.org Wed Dec 10 07:55:30 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 Dec 2025 07:55:30 GMT Subject: RFR: 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 12:59:03 GMT, David Briemann wrote: > Aligning upwards instead of downwards not only solves the crash in large huge page scenarios but also ensures that the cache sizes are at least as big as they were set. Looks good to me, too. Thanks for fixing this! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28658#pullrequestreview-3561200546 From epeter at openjdk.org Wed Dec 10 08:07:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 08:07:28 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v2] In-Reply-To: <974SIxPSx8nU0vFHAVxYyyw9UiTFtYWspmvNfOD1cOQ=.e3ce2b34-0e42-42c8-8a3c-49463203f21c@github.com> References: <974SIxPSx8nU0vFHAVxYyyw9UiTFtYWspmvNfOD1cOQ=.e3ce2b34-0e42-42c8-8a3c-49463203f21c@github.com> Message-ID: On Wed, 10 Dec 2025 02:56:42 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> riscv + aarch64 > > test/hotspot/jtreg/compiler/c2/cmove/TestFPComparison2.java line 34: > >> 32: * @summary The test is to trigger code path of BoolTest::ge/gt in C2_MacroAssembler::enc_cmove_cmp_fp >> 33: * @requires vm.debug >> 34: * @requires os.arch == "riscv64" | os.arch=="aarch64" > > One minor question: Could this work for `x64` as well? Nice catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28702#discussion_r2605607648 From roland at openjdk.org Wed Dec 10 08:08:28 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 10 Dec 2025 08:08:28 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: <7ZPdHr7IzEoj0yh45zEt-8ogQ8-2q435PPXieqqZKJU=.4366191f-729e-4e38-84f3-628f0d83cb33@github.com> References: <7ZPdHr7IzEoj0yh45zEt-8ogQ8-2q435PPXieqqZKJU=.4366191f-729e-4e38-84f3-628f0d83cb33@github.com> Message-ID: On Tue, 9 Dec 2025 17:56:32 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > The internal test is fixed and sanity testing passed - you can move forward with integrating this PR :-) @chhagedorn @eme64 @merykitty thanks for the reviews and testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-3635860312 From chagedorn at openjdk.org Wed Dec 10 08:09:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 Dec 2025 08:09:27 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v9] In-Reply-To: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> References: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> Message-ID: <7__4QGMz-5idIedYYk-0HrQnnMfTa4d-Gl75gvajV0A=.ecba821d-731d-420f-a3f6-9feeb869b20b@github.com> On Mon, 8 Dec 2025 19:10:48 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - Review > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Bugs and verify loader leak > - Try to avoid loader leak > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure > - Test from Jorn > - Copyright years > - Fix problem identified by Jorn > - Rollback getAndAdd for now > - ... and 7 more: https://git.openjdk.org/jdk/compare/45fb0e07...d734e8a6 test/hotspot/jtreg/compiler/c2/irTests/constantFold/VarHandleMismatchedTypeFold.java line 70: > 68: > 69: @Check(test = "testSum") > 70: public void runTestSum() { Drive-by comment: From the method name, it seems that you want to have a runner method. For that, you need to switch to `@Run`. But you can also do result verification with `@Check`. In the latter case, you do not need to call `testSum()` again but you can just add a `long result` as parameter. The IR framework will then call this `@Check` method with the result of `testSum()`. Example: https://github.com/openjdk/jdk/blob/1bbbce75c5e68429c2a32519eb3c36d964dcdf57/test/hotspot/jtreg/testlibrary_tests/ir_framework/examples/CheckedTestExample.java#L94-L102 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2605606400 From roland at openjdk.org Wed Dec 10 08:28:28 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 10 Dec 2025 08:28:28 GMT Subject: RFR: 8364490: Fatal error on large SpecTrapLimitExtraEntries value [v3] In-Reply-To: <2DNZFggnLVM-qT_BaYB2JsKRfsE4JlbJOaUWnbbnR9Q=.33648143-4989-412e-a3a6-74cdd5622932@github.com> References: <2DNZFggnLVM-qT_BaYB2JsKRfsE4JlbJOaUWnbbnR9Q=.33648143-4989-412e-a3a6-74cdd5622932@github.com> Message-ID: On Tue, 25 Nov 2025 17:37:27 GMT, Anton Seoane Ampudia wrote: >> This PR addresses VM crashes on very large values for `SpecTrapLimitExtraEntries`. >> >> The experimental `SpecTrapLimitExtraEntries` allows for a user-specified number of extra method data trap entries for speculation. Currently, this number is implemented with an `int`, which means that users can specify very large values that will translate into huge `MethodData` objects that cannot be allocated in Metaspace. >> >> An `int` range of values should not be allowed, as negative `SpecTrapLimitExtraEntries` do not make any sense, and very high values (such as the ones that cause this crash) are equally nonsensical. This changeset adds a range to the flag values to address these issues. >> >> `SpecTrapLimitExtraEntries` is `MAX`ed with HotSpot's computed heuristic, which means that in any case it can only serve as a buffer above the heuristic. Based on benchmarks where I logged heuristic-derived values for extra `DataLayout` cells, even a value of 50 for `SpecTrapLimitExtraEntries` is more than sufficient. To provide some headroom and keep things simple, I have set the upper limit to 100. >> >> **Testing:** passes tiers 1-2 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28451#pullrequestreview-3561310749 From shade at openjdk.org Wed Dec 10 08:33:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 Dec 2025 08:33:30 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v9] In-Reply-To: References: Message-ID: <8N5F01Ve6BLQjlsJ7nD53xn70wa_NrRMqeuSb1Grp6M=.f45dd3b4-b282-41a8-a245-b7673bf65261@github.com> > See the bug for discussion what issues current machinery has. > > This PR executes the plan outlined in the bug: > 1. Common the receiver type profiling code in interpreter and C1 > 2. Rewrite receiver type profiling code to only do atomic receiver slot installations > 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed > > This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - More comments - Tighten up the comments - Simplify third case: no need to loop, just restart the search - Actually have a second "fast" case: receiver is not found in the table, and the table is full - Pushing/popping for rare CAS path is counter-productive - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - Tighten up some more - Offset is always rscratch1, no need to save it - ... and 12 more: https://git.openjdk.org/jdk/compare/1bbbce75...c28810e3 ------------- Changes: https://git.openjdk.org/jdk/pull/25305/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=08 Stats: 418 lines in 8 files changed: 202 ins; 197 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/25305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305 PR: https://git.openjdk.org/jdk/pull/25305 From shade at openjdk.org Wed Dec 10 08:33:32 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 Dec 2025 08:33:32 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 19:14:43 GMT, Vladimir Ivanov wrote: > I'm curious how performance-sensitive that part of code is. Does it make sense to try to further optimize it? This is about 5-th-ish version of this code, so I don't think there is more juice to squeeze out of it. The current version is more or less optimal. The stratification into three cases looks the best performing overall. > fast path can be further optimized for no nulls case by offloading more work on found_null slow path [1] Yeah, but putting checks for both installed receiver and nullptr slot turns out hurting performance; this is bad even without extra control flow. Two separate loops are more efficient, even for small number of iterations. It also helpfully optimizes for the best case, when profile is smaller than `TypeProfileWidth`, which is what we want. > 2 slots is the most common case; any benefits from optimizing specifically for it (e.g., unroll the loops)? I don't think it is worth the extra complexity, honestly. The loop-y code in current version is still a significant code density win over the decision-tree (unrolled, effectively) approach we are doing currently. Keeping this thing simple means more reliability and less testing surface, plus much less headache to port to other architectures. Note that the goal for this work is to _improve profiling reliability_ without hopefully ceding too much ground in code density and performance. When I started out, it was not clear if it is doable, given the need for atomics; but it looks doable indeed. So I think we should call this thing done and move on to solving the actual performance problem in this code: the contention on counter updates. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3635936502 From chagedorn at openjdk.org Wed Dec 10 08:43:36 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 Dec 2025 08:43:36 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 14:05:06 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3561363531 From aseoane at openjdk.org Wed Dec 10 08:44:31 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 10 Dec 2025 08:44:31 GMT Subject: RFR: 8364490: Fatal error on large SpecTrapLimitExtraEntries value [v3] In-Reply-To: <2DNZFggnLVM-qT_BaYB2JsKRfsE4JlbJOaUWnbbnR9Q=.33648143-4989-412e-a3a6-74cdd5622932@github.com> References: <2DNZFggnLVM-qT_BaYB2JsKRfsE4JlbJOaUWnbbnR9Q=.33648143-4989-412e-a3a6-74cdd5622932@github.com> Message-ID: On Tue, 25 Nov 2025 17:37:27 GMT, Anton Seoane Ampudia wrote: >> This PR addresses VM crashes on very large values for `SpecTrapLimitExtraEntries`. >> >> The experimental `SpecTrapLimitExtraEntries` allows for a user-specified number of extra method data trap entries for speculation. Currently, this number is implemented with an `int`, which means that users can specify very large values that will translate into huge `MethodData` objects that cannot be allocated in Metaspace. >> >> An `int` range of values should not be allowed, as negative `SpecTrapLimitExtraEntries` do not make any sense, and very high values (such as the ones that cause this crash) are equally nonsensical. This changeset adds a range to the flag values to address these issues. >> >> `SpecTrapLimitExtraEntries` is `MAX`ed with HotSpot's computed heuristic, which means that in any case it can only serve as a buffer above the heuristic. Based on benchmarks where I logged heuristic-derived values for extra `DataLayout` cells, even a value of 50 for `SpecTrapLimitExtraEntries` is more than sufficient. To provide some headroom and keep things simple, I have set the upper limit to 100. >> >> **Testing:** passes tiers 1-2 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Thanks all for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28451#issuecomment-3635977705 From duke at openjdk.org Wed Dec 10 08:44:33 2025 From: duke at openjdk.org (duke) Date: Wed, 10 Dec 2025 08:44:33 GMT Subject: RFR: 8364490: Fatal error on large SpecTrapLimitExtraEntries value [v3] In-Reply-To: <2DNZFggnLVM-qT_BaYB2JsKRfsE4JlbJOaUWnbbnR9Q=.33648143-4989-412e-a3a6-74cdd5622932@github.com> References: <2DNZFggnLVM-qT_BaYB2JsKRfsE4JlbJOaUWnbbnR9Q=.33648143-4989-412e-a3a6-74cdd5622932@github.com> Message-ID: On Tue, 25 Nov 2025 17:37:27 GMT, Anton Seoane Ampudia wrote: >> This PR addresses VM crashes on very large values for `SpecTrapLimitExtraEntries`. >> >> The experimental `SpecTrapLimitExtraEntries` allows for a user-specified number of extra method data trap entries for speculation. Currently, this number is implemented with an `int`, which means that users can specify very large values that will translate into huge `MethodData` objects that cannot be allocated in Metaspace. >> >> An `int` range of values should not be allowed, as negative `SpecTrapLimitExtraEntries` do not make any sense, and very high values (such as the ones that cause this crash) are equally nonsensical. This changeset adds a range to the flag values to address these issues. >> >> `SpecTrapLimitExtraEntries` is `MAX`ed with HotSpot's computed heuristic, which means that in any case it can only serve as a buffer above the heuristic. Based on benchmarks where I logged heuristic-derived values for extra `DataLayout` cells, even a value of 50 for `SpecTrapLimitExtraEntries` is more than sufficient. To provide some headroom and keep things simple, I have set the upper limit to 100. >> >> **Testing:** passes tiers 1-2 > > Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: > > Review comments @anton-seoane Your change (at version 49097207b3a70ed9f5a7cf46b694a68be9695b24) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28451#issuecomment-3635983743 From roland at openjdk.org Wed Dec 10 08:48:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 10 Dec 2025 08:48:41 GMT Subject: Integrated: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: <37GHpeFd6FKbfVuMmdhz9-YPcEQcC_fYBRjlLzrkRHg=.f865988e-1ca9-4731-921c-b73029c484cd@github.com> On Thu, 10 Apr 2025 15:15:54 GMT, Roland Westrelin wrote: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. This pull request has now been integrated. Changeset: 00068a80 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/00068a80304a809297d0df8698850861e9a1c5e9 Stats: 367 lines in 13 files changed: 266 ins; 27 del; 74 mod 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs Reviewed-by: chagedorn, qamai, galder, epeter ------------- PR: https://git.openjdk.org/jdk/pull/24575 From bmaillard at openjdk.org Wed Dec 10 08:55:23 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 10 Dec 2025 08:55:23 GMT Subject: RFR: 8372302: C2: IGVN verification fails because ModXNode::Ideal creates unused intermediate nodes Message-ID: This PR addresses a failure in IGVN verification with `ModI` and `ModL` nodes. In `ModeXNode::Ideal`, we have code to optimize a modulo expression by expressing it in terms of other operations. There are actually two distinct cases, one where the divisor is a constant and is equal to `modulo 2^k-1` for some integer `k`, and a more general case where other transformations do not succeed. Because these transformations involve creating several new nodes (sometimes in a loop) and calling `phase->transform(...)` on them, we want to avoid accidentally triggering optimizations on the "unfinished" state of the subgraph. For this, we create a temporary dummy node and add edges to the nodes being constructed. There are some execution paths where the node is not destroyed before `Ideal` returns, and this creates issues during IGVN verification, as the verification code checks if the number of nodes has changed after having called `Ideal` on a given node and not expecting changes. The path in question is when we exit because the divisor is a constant and is the minimum value: https://github.com/openjdk/jdk/blob/c19b12927d2ac901ec8ccaa2de5897ee4c47af56/src/hotspot/share/opto/divnode.cpp#L1146-L1147 The zero case does not cause problems (this seems to be because it would hide behind a `div0_check` anyway). The fix is simply to only create the temporary node when it is needed, and thus avoiding returning without destroying it. ### Testing - [x] [GitHub Actions](TODO) - [x] tier1-4, plus some internal testing Thank you for reviewing! ------------- Commit messages: - Update bug number format - Fix style - Add reduced test and fix Changes: https://git.openjdk.org/jdk/pull/28488/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28488&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372302 Stats: 70 lines in 2 files changed: 63 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28488.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28488/head:pull/28488 PR: https://git.openjdk.org/jdk/pull/28488 From aseoane at openjdk.org Wed Dec 10 08:56:49 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Wed, 10 Dec 2025 08:56:49 GMT Subject: Integrated: 8364490: Fatal error on large SpecTrapLimitExtraEntries value In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 12:39:01 GMT, Anton Seoane Ampudia wrote: > This PR addresses VM crashes on very large values for `SpecTrapLimitExtraEntries`. > > The experimental `SpecTrapLimitExtraEntries` allows for a user-specified number of extra method data trap entries for speculation. Currently, this number is implemented with an `int`, which means that users can specify very large values that will translate into huge `MethodData` objects that cannot be allocated in Metaspace. > > An `int` range of values should not be allowed, as negative `SpecTrapLimitExtraEntries` do not make any sense, and very high values (such as the ones that cause this crash) are equally nonsensical. This changeset adds a range to the flag values to address these issues. > > `SpecTrapLimitExtraEntries` is `MAX`ed with HotSpot's computed heuristic, which means that in any case it can only serve as a buffer above the heuristic. Based on benchmarks where I logged heuristic-derived values for extra `DataLayout` cells, even a value of 50 for `SpecTrapLimitExtraEntries` is more than sufficient. To provide some headroom and keep things simple, I have set the upper limit to 100. > > **Testing:** passes tiers 1-2 This pull request has now been integrated. Changeset: b60ac710 Author: Anton Seoane Ampudia Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/b60ac710bebf195972436da324983e61b51484ef Stats: 41 lines in 2 files changed: 41 ins; 0 del; 0 mod 8364490: Fatal error on large SpecTrapLimitExtraEntries value Reviewed-by: chagedorn, roland ------------- PR: https://git.openjdk.org/jdk/pull/28451 From mli at openjdk.org Wed Dec 10 09:20:27 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 10 Dec 2025 09:20:27 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v2] In-Reply-To: References: <974SIxPSx8nU0vFHAVxYyyw9UiTFtYWspmvNfOD1cOQ=.e3ce2b34-0e42-42c8-8a3c-49463203f21c@github.com> Message-ID: On Wed, 10 Dec 2025 08:04:55 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/cmove/TestFPComparison2.java line 34: >> >>> 32: * @summary The test is to trigger code path of BoolTest::ge/gt in C2_MacroAssembler::enc_cmove_cmp_fp >>> 33: * @requires vm.debug >>> 34: * @requires os.arch == "riscv64" | os.arch=="aarch64" >> >> One minor question: Could this work for `x64` as well? > > Nice catch! No, it does not. But I did not investigate further in this pr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28702#discussion_r2605828181 From bmaillard at openjdk.org Wed Dec 10 09:45:25 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 10 Dec 2025 09:45:25 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: <_HueHgU8Ha0yoG9cckWMGfms8D0WC6zGWKykIQkCeZM=.3f929996-99ee-4535-8973-b23ccf6b291e@github.com> References: <_HueHgU8Ha0yoG9cckWMGfms8D0WC6zGWKykIQkCeZM=.3f929996-99ee-4535-8973-b23ccf6b291e@github.com> Message-ID: <2y_kifh8vJEM5lyJpXJQ-rXVAz7ThwswYO0q4-7At00=.9607b284-e8be-4b6b-bf22-3671ac936ad5@github.com> On Tue, 2 Dec 2025 11:16:50 GMT, Roland Westrelin wrote: >> test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java line 28: >> >>> 26: * @bug 8370519 >>> 27: * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations >>> 28: * @run main/othervm -XX:CompileCommand=compileonly,*TestVerifyLoopOptimizationsHighMemUsage*::* -XX:-TieredCompilation -Xbatch >> >> Out of curiosity, have you try reducing the test with `creduce`? I fixed a similar issue in [JDK-8366990](https://bugs.openjdk.org/browse/JDK-8366990), and initially reviewers were concerned about the long compilation time. I was able to get decent results with `creduce` by using `-XX:CompileCommand=memlimit`. Not sure if it's worth doing here though. > > I don't have `creduce` set up. I tried minimizing the test case by hand but it was fairly time consuming. It currently runs in 30s on a fairly fast machine. I am giving it a try on my setup, I'll let you know if something comes up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2605912924 From bmaillard at openjdk.org Wed Dec 10 09:45:28 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 10 Dec 2025 09:45:28 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 11:21:05 GMT, Roland Westrelin wrote: >> For this failure memory stats are: >> >> >> Total Usage: 1095525816 >> --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- >> Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other >> none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 >> parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 >> optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 >> connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 >> iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 >> idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 >> macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 >> postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 >> scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 >> regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 >> ... > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - review > - review test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java line 30: > 28: * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations > 29: * @run main/othervm -XX:CompileCommand=compileonly,*TestVerifyLoopOptimizationsHighMemUsage*::* -XX:-TieredCompilation -Xbatch > 30: * -XX:+UnlockDiagnosticVMOptions -XX:+StressLoopPeeling -XX:+VerifyLoopOptimizations I think you'll also need `-XX:+IgnoreUnrecognizedVMOptions` (it is causing issues in Github actions). Suggestion: * -XX:+UnlockDiagnosticVMOptions -XX:+IgnoreUnrecognizedVMOptions * -XX:+StressLoopPeeling -XX:+VerifyLoopOptimizations ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2605877498 From dfenacci at openjdk.org Wed Dec 10 10:03:44 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 10 Dec 2025 10:03:44 GMT Subject: RFR: 8370489: Some compiler tests miss the @key randomness [v2] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 11:24:19 GMT, Saranya Natarajan wrote: >> **Issue:** Some compiler tests uses randomization but does not have `@key randomness` in the jtreg header. >> >> **Fix:** The list of test cases that did not have `@key randomness` were listed using `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"`. This PR adds `@key randomness` to these tests. >> >> **Note:** The following tests that are still listed with `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"` after this PR are confirmed to be helper or support file for actual test. >> _test/hotspot/jtreg/compiler/codegen/aes/TestAESBase.java >> test/hotspot/jtreg/compiler/compilercontrol/jcmd/StressAddJcmdBase.java >> test/hotspot/jtreg/compiler/compilercontrol/parser/HugeDirectiveUtil.java >> test/hotspot/jtreg/compiler/compilercontrol/share/scenario/CommandGenerator.java >> test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java >> test/hotspot/jtreg/compiler/lib/ir_framework/test/ArgumentValue.java >> test/hotspot/jtreg/compiler/lib/ir_framework/AbstractInfo.java >> test/hotspot/jtreg/compiler/lib/ir_framework/CompLevel.java >> test/hotspot/jtreg/compiler/lib/generators/Generators.java >> test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java >> test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java >> test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java >> test/hotspot/jtreg/compiler/intrinsics/mathexact/Verify.java >> test/hotspot/jtreg/compiler/intrinsics/bmi/BMITestRunner.java >> test/hotspot/jtreg/compiler/intrinsics/unsafe/ByteBufferTest.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressBooleanArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressIntArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressLongArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressCharArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressObjectArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressByteArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressFloatArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressShortArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressDoubleArrayCopy.java >> test/hotspot/jtreg/compiler/codecache/cli/codeheapsize/JVMStartupRunner.java >> test/hotspot/jtreg/compiler/vectorapi/reshape/utils/VectorReshapeHelper.java >> test/hotspot/jtreg/compiler/jvmci/compilerToVM/DummyClass.java_ > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments - removing space and javadoc style Still looks good. Thanks @sarannat! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/28463#pullrequestreview-3561688239 From jbhateja at openjdk.org Wed Dec 10 10:07:28 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 10 Dec 2025 10:07:28 GMT Subject: RFR: 8368977: Provide clear naming for AVX10 identifiers [v2] In-Reply-To: References: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> Message-ID: On Mon, 8 Dec 2025 21:47:16 GMT, Mohamed Issa wrote: >> This is a simple change that renames all AVX10 identifiers to explicitly show which sub-versions are in use. Right now, AVX10.2 is the only case to worry about. The JTREG tests listed below were used to verify correctness with the recommended JVM options mentioned in corresponding source files. Each test included runs through emulation with AVX10.2 enabled and disabled to exercise all possible paths. All modifications and tests used [OpenJDK v26-b24](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B24) as the baseline build. >> >> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` >> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` >> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` >> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` >> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` >> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` >> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` >> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` >> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java` >> 10. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java` >> 11. `jtreg:test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java` >> 12. `jtreg:test/jdk/jdk/incubator/vector/Double64VectorTests.java` >> 13. `jtreg:test/jdk/jdk/incubator/vector/Double128VectorTests.java` >> 14. `jtreg:test/jdk/jdk/incubator/vector/Double256VectorTests.java` >> 15. `jtreg:test/jdk/jdk/incubator/vector/Double512VectorTests.java` >> 16. `jtreg:test/jdk/jdk/incubator/vector/DoubleMaxVectorTests.java` >> 17. `jtreg:test/jdk/jdk/incubator/vector/Float64VectorTests.java` >> 18. `jtreg:test/jdk/jdk/incubator/vector/Float128VectorTests.java` >> 19. `jtreg:test/jdk/jdk/incubator/vector/Float256VectorTests.java` >> 20. `jtreg:test/jdk/jdk/incubator/vector/Float512VectorTests.java` >> 21. `jtreg:test/jdk/jdk/incubator/vector/FloatMaxVectorTests.java` >> 22. `jtreg:test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` >> 23. `jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java` > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Remove changes that affect functionality LGTM, naming now follows the AVX10 versions sheme where an instruction supported by a version is guaranteed to be supported by all the version above it. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28344#pullrequestreview-3561706798 From epeter at openjdk.org Wed Dec 10 10:14:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 10:14:36 GMT Subject: RFR: 8370489: Some compiler tests miss the @key randomness [v2] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 11:24:19 GMT, Saranya Natarajan wrote: >> **Issue:** Some compiler tests uses randomization but does not have `@key randomness` in the jtreg header. >> >> **Fix:** The list of test cases that did not have `@key randomness` were listed using `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"`. This PR adds `@key randomness` to these tests. >> >> **Note:** The following tests that are still listed with `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"` after this PR are confirmed to be helper or support file for actual test. >> _test/hotspot/jtreg/compiler/codegen/aes/TestAESBase.java >> test/hotspot/jtreg/compiler/compilercontrol/jcmd/StressAddJcmdBase.java >> test/hotspot/jtreg/compiler/compilercontrol/parser/HugeDirectiveUtil.java >> test/hotspot/jtreg/compiler/compilercontrol/share/scenario/CommandGenerator.java >> test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java >> test/hotspot/jtreg/compiler/lib/ir_framework/test/ArgumentValue.java >> test/hotspot/jtreg/compiler/lib/ir_framework/AbstractInfo.java >> test/hotspot/jtreg/compiler/lib/ir_framework/CompLevel.java >> test/hotspot/jtreg/compiler/lib/generators/Generators.java >> test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java >> test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java >> test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java >> test/hotspot/jtreg/compiler/intrinsics/mathexact/Verify.java >> test/hotspot/jtreg/compiler/intrinsics/bmi/BMITestRunner.java >> test/hotspot/jtreg/compiler/intrinsics/unsafe/ByteBufferTest.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressBooleanArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressIntArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressLongArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressCharArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressObjectArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressByteArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressFloatArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressShortArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressDoubleArrayCopy.java >> test/hotspot/jtreg/compiler/codecache/cli/codeheapsize/JVMStartupRunner.java >> test/hotspot/jtreg/compiler/vectorapi/reshape/utils/VectorReshapeHelper.java >> test/hotspot/jtreg/compiler/jvmci/compilerToVM/DummyClass.java_ > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments - removing space and javadoc style Still good, thanks for the updates! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28463#pullrequestreview-3561735478 From dbriemann at openjdk.org Wed Dec 10 10:21:47 2025 From: dbriemann at openjdk.org (David Briemann) Date: Wed, 10 Dec 2025 10:21:47 GMT Subject: RFR: 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 12:59:03 GMT, David Briemann wrote: > Aligning upwards instead of downwards not only solves the crash in large huge page scenarios but also ensures that the cache sizes are at least as big as they were set. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28658#issuecomment-3636357357 From jbhateja at openjdk.org Wed Dec 10 10:23:30 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 10 Dec 2025 10:23:30 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v7] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Optimizing tail handling - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Cleanups - Fix failing jtreg test in CI - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Cleanups - Adding support for custom basic type T_FLOAT16, passing BasicType lane types to inline expander entries - Cleaning up interface as per review suggestions - Some cleanups - ... and 10 more: https://git.openjdk.org/jdk/compare/b60ac710...44ac727d ------------- Changes: https://git.openjdk.org/jdk/pull/28002/files Webrev: Webrev is not available because diff is too large Stats: 515190 lines in 231 files changed: 284426 ins; 229037 del; 1727 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From dbriemann at openjdk.org Wed Dec 10 10:25:01 2025 From: dbriemann at openjdk.org (David Briemann) Date: Wed, 10 Dec 2025 10:25:01 GMT Subject: Integrated: 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 12:59:03 GMT, David Briemann wrote: > Aligning upwards instead of downwards not only solves the crash in large huge page scenarios but also ensures that the cache sizes are at least as big as they were set. This pull request has now been integrated. Changeset: 8eaeb699 Author: David Briemann URL: https://git.openjdk.org/jdk/commit/8eaeb6990b85ac8717f4fc4ce883f674017b91f3 Stats: 10 lines in 1 file changed: 0 ins; 6 del; 4 mod 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled Reviewed-by: mdoerr, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28658 From dfenacci at openjdk.org Wed Dec 10 10:25:49 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 10 Dec 2025 10:25:49 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 11:49:29 GMT, Roland Westrelin wrote: > The crash occurs because verification code expects the inner and outer > loop of a loop strip mining nest to have the same number of phis but, > in this case, the inner loop has one more memory phis than the outer > loop. > > 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and > outer loops have the same number of phis, as expected. > > > 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] > 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > > 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 > through the outer loop phi: > > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTe... > Right, this is what I propose to fix the present issue and it seems cleaner to me (we let Identity handle the identity transformations). I doubt there'll be a measurable compilation time difference. It seems to be slightly simpler as well and a bit more inline with what `Ideal` should do. That said, I'm not too sure of what the "guidelines" are. So, really I have no strong opinions either. src/hotspot/share/opto/cfgnode.cpp line 2753: > 2751: > 2752: bool PhiNode::can_be_replaced_by(const PhiNode* other) const { > 2753: return type() == Type::MEMORY && other->type() == Type::MEMORY && adr_type() != TypePtr::BOTTOM && I think I might miss something but I was wondering if we strictly need to check for `adr_type() != TypePtr::BOTTOM` ------------- PR Review: https://git.openjdk.org/jdk/pull/28677#pullrequestreview-3561738181 PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2606016558 From bkilambi at openjdk.org Wed Dec 10 11:24:28 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 10 Dec 2025 11:24:28 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v7] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 10:23:30 GMT, Jatin Bhateja wrote: >> Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. >> - Add necessary inline expander support. >> - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. >> - Use existing Float16 vector IR and backend support. >> - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. >> >> The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). >> >> The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. >> >> image >> >> Initial RFP[1] was floated on the panama-dev mailing list. >> >> Kindly review the draft PR and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Optimizing tail handling > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Cleanups > - Fix failing jtreg test in CI > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Cleanups > - Adding support for custom basic type T_FLOAT16, passing BasicType lane types to inline expander entries > - Cleaning up interface as per review suggestions > - Some cleanups > - ... and 10 more: https://git.openjdk.org/jdk/compare/b60ac710...44ac727d This patch results in two of the JTREG tests failing on aarch64 machines- jdk/incubator/vector/Float16Vector512Tests.java compiler/vectorapi/TestFloat16VectorOperations.java which is due to an issue in the `aarch64.ad` file. Fixed the failures and also added aarch64 specific IR rules which were missing for some of the tests in the `compiler/vectorapi/TestFloat16VectorOperations.java` test. @jatin-bhateja Could you please add the attached fix to this patch? Thanks! [fix.patch](https://github.com/user-attachments/files/24076067/fix.patch) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3636610190 From epeter at openjdk.org Wed Dec 10 11:37:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 11:37:35 GMT Subject: [jdk26] RFR: 8367028: compiler/c2/irTests/TestFloat16ScalarOperations.java failing intermittently because of constant folding In-Reply-To: References: Message-ID: <5TR12XVgAGWFDnSHqLeX1Hp3s-Ax2lZHpwziWC53nuc=.8d5ac1b2-9369-42f1-87c9-4d703dcae92f@github.com> On Tue, 9 Dec 2025 15:15:57 GMT, Tobias Hartmann wrote: >> Clean backport of https://github.com/openjdk/jdk/pull/28678 > > Looks good. @TobiHartmann @mhaessig thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28720#issuecomment-3636655772 From epeter at openjdk.org Wed Dec 10 11:37:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 11:37:36 GMT Subject: [jdk26] Integrated: 8367028: compiler/c2/irTests/TestFloat16ScalarOperations.java failing intermittently because of constant folding In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 14:32:26 GMT, Emanuel Peter wrote: > Clean backport of https://github.com/openjdk/jdk/pull/28678 This pull request has now been integrated. Changeset: 42fc4fe7 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/42fc4fe7b1b1add3113747509caca1284df5b619 Stats: 17 lines in 1 file changed: 9 ins; 0 del; 8 mod 8367028: compiler/c2/irTests/TestFloat16ScalarOperations.java failing intermittently because of constant folding Reviewed-by: thartmann, mhaessig Backport-of: a4eb57c5ec6254e59e486042015dd00457284ef2 ------------- PR: https://git.openjdk.org/jdk/pull/28720 From epeter at openjdk.org Wed Dec 10 11:41:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 11:41:42 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 22:23:28 GMT, Emanuel Peter wrote: > We should test `Float16` with Template Framework Tests. For this, I'm now implementing: > > - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. > - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. > - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. Can someone please review this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28095#issuecomment-3636672372 From epeter at openjdk.org Wed Dec 10 11:47:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 11:47:39 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v3] In-Reply-To: References: Message-ID: <4JnkUEx4O7We0dTeMUzQa3pFdRaSJoWHrXlhB3wFc0M=.07e9b81b-187d-4e72-b7e3-492c5cda0e0c@github.com> On Mon, 24 Nov 2025 09:56:10 GMT, Beno?t Maillard wrote: >> This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. >> >> The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for >> `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially >> introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. >> >> >> >> https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 >> >> In our case, it happens that the `Load` node gets folded to a constant during the initial >> `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being >> returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only >> has one usage, and this triggers the optimization during verification. >> >> >> static int test0() { >> var c = new MyClass(); >> // the conversion ensures that the ConL node only has one use >> // in the end, which triggers the optimization >> return (int) c.l; >> } >> >> >> The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, >> because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in >> `PhaseGVN::transform`. >> >> For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created >> and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with >> `can_reshape` later. >> >> >> This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` >> prevents its from occurring when boxing elimination is enabled. Boxing elimination is >> disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), >> which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear >> that the issue was on mainline. >> >> ### Testing >> - [x] [GitHub Actions](TODO) >> - [x] tier1-3, plus some internal testing >> >> Thank you for reviewing! > > Beno?t Maillard has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into JDK-8367627 > - Add notification in Node::has_special_unique_user > - Add run with -XX:+AlwaysIncrementalInline, and add intermediate run for -XX:-DoEscapeAnalysis > - Record in GraphKit::insert_mem_bar_volatile for consistency > - Improve test and fix > - Add test @benoitmaillard Thanks for looking at this! The explanation and fix look good to me :) test/hotspot/jtreg/compiler/c2/TestMissingOptMemBarRemovePrecedentEdge.java line 2: > 1: /* > 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. Should the test go into an `igvn` directory? Or something else a bit more specific? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28448#pullrequestreview-3562147712 PR Review Comment: https://git.openjdk.org/jdk/pull/28448#discussion_r2606339569 From epeter at openjdk.org Wed Dec 10 11:48:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 11:48:40 GMT Subject: [jdk26] RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 03:47:31 GMT, Xiaohong Gong wrote: > Hi all, > > This pull request contains a backport of commit [b6732d60](https://github.com/openjdk/jdk/commit/b6732d6048259de68a3dd5b4f66ac82f87270404) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaohong Gong on 10 Dec 2025 and was reviewed by Emanuel Peter, Eric Fang and Hao Sun. > > Thanks! Thanks for backporting! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28732#pullrequestreview-3562153023 From epeter at openjdk.org Wed Dec 10 11:56:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 11:56:22 GMT Subject: RFR: 8351844: C2 x64 AVX2 vpminmax assertion failure with equivalent inputs In-Reply-To: <4jqjXLkV2LwlS1HRlb2fFIJhO-jU6C2_yVWyiB9z2ZI=.208e3745-23b8-466e-9ea9-df42e49119a4@github.com> References: <4jqjXLkV2LwlS1HRlb2fFIJhO-jU6C2_yVWyiB9z2ZI=.208e3745-23b8-466e-9ea9-df42e49119a4@github.com> Message-ID: <3CEJE_-SExQQKYkK3Vx45chu6bkOKneCa_Tf63N03YQ=.d358edeb-0577-4b5c-a133-f8217498cb09@github.com> On Wed, 3 Dec 2025 11:12:01 GMT, Aleksey Shipilev wrote: >> Bug fix PR fixes an incorrect register equivalence in macro assembler. MaxV/MinV IR with equivalent inputs should ideally be removed from ideal graph before reaching to macro assembler. [JDK-8372797](https://bugs.openjdk.org/browse/JDK-8372797) is filed to add relevant identity transformations. >> >> Best Regards, >> Jatin > > test/hotspot/jtreg/compiler/vectorapi/TestVectorMinMaxSameInputs.java line 44: > >> 42: >> 43: public static void main(String[] args) { >> 44: TestFramework.runWithFlags("--add-modules=jdk.incubator.vector", "-ea", "-XX:+IgnoreUnrecognizedVMOptions", "-XX:UseAVX=2"); > > I understand `-XX:UseAVX=2` is here to hit the path where the assert is on. But for a generic test like this, it would seem unwise to limit the test configuration only to AVX=2. I would expect we instead run the tests with `TEST_VM_OPTS=-XX:UseAVX=2` to confirm they work with AVX=2 even on AVX-512 machines. You could always have 2 runs, if AVX2 is super important for reproducing. One with and one without the flag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28600#discussion_r2606366103 From epeter at openjdk.org Wed Dec 10 13:05:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 13:05:33 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v2] In-Reply-To: References: <974SIxPSx8nU0vFHAVxYyyw9UiTFtYWspmvNfOD1cOQ=.e3ce2b34-0e42-42c8-8a3c-49463203f21c@github.com> Message-ID: On Wed, 10 Dec 2025 09:18:04 GMT, Hamlin Li wrote: >> Nice catch! > > No, it does not. But I did not investigate further in this pr. I suppose adding `aarch64` is already a step in the right direction. But even better would be if we could remove the `@requires` completely, and just add restrictions to the `IR` rules. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28702#discussion_r2606584703 From epeter at openjdk.org Wed Dec 10 13:07:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 13:07:34 GMT Subject: RFR: 8350208: CTW: GraphKit::add_safepoint_edges asserts "not enough operands for reexecution" In-Reply-To: References: Message-ID: <98EVpJFhkgWbAayPsHQB3j1TC3pcedou1EcHBdepAew=.f3560f11-b69c-4e4b-b50e-127a029598e8@github.com> On Tue, 2 Dec 2025 10:30:46 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the issue of the compiler crashing with "not enough operands for reexecution". The issue here is that during `Parse::catch_inline_exceptions`, the old stack is gone, and we cannot reexecute the current bytecode anymore. However, there are some places where we try to insert safepoints into the graph, such as if the handler is a backward jump, or if one of the exceptions in the handlers is not loaded. Since the `_reexecute` state of the current jvms is "undefined", it is inferred automatically that it should reexecute for some bytecodes such as `putfield`. The solution then is to explicitly set `_reexecute` to false. > > I can manage to write a unit test for the case of a backward handler, for the other cases, since the exceptions that can be thrown for a bytecode that is inferred to reexecute are `NullPointerException`, `ArrayIndexOutOfBoundsException`, and `ArrayStoreException`. I find it hard to construct such a test in which one of them is not loaded. > > Please kindly review, thanks a lot. test/hotspot/jtreg/compiler/exceptions/TestDebugDuringExceptionCatching.java line 45: > 43: * @build test.java.lang.invoke.lib.InstructionHelper > 44: * > 45: * @run main/othervm compiler.exceptions.TestDebugDuringExceptionCatching Suggestion: * @run main/othervm ${test.main.class} Drive-by comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28597#discussion_r2606588199 From epeter at openjdk.org Wed Dec 10 13:12:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 13:12:34 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v3] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 23:25:29 GMT, Chen Liang wrote: >> Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Typo Drive-by comments test/hotspot/jtreg/compiler/c2/irTests/constantFold/IdentityHashCodeFold.java line 37: > 35: * @library /test/lib / > 36: * @requires vm.compiler2.enabled > 37: * @run driver compiler.c2.irTests.constantFold.IdentityHashCodeFold Suggestion: * @run driver ${test.main.class} Is the C2 requirement really necessary? test/hotspot/jtreg/compiler/c2/irTests/constantFold/IdentityHashCodeFold.java line 51: > 49: public int testSum() { > 50: return a.hashCode() + System.identityHashCode(b); > 51: } This does not test correctness of the result. How confident are we that this patch is sufficiently tested? How can we test that the compiled and interpreter hashcode are equivalent? ------------- PR Review: https://git.openjdk.org/jdk/pull/28589#pullrequestreview-3562463539 PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2606597900 PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2606602984 From epeter at openjdk.org Wed Dec 10 13:18:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 13:18:31 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 11:21:05 GMT, Roland Westrelin wrote: >> For this failure memory stats are: >> >> >> Total Usage: 1095525816 >> --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- >> Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other >> none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 >> parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 >> optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 >> connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 >> iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 >> idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 >> macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 >> postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 >> scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 >> regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 >> ... > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - review > - review Changes requested by epeter (Reviewer). src/hotspot/share/opto/loopnode.hpp line 672: > 670: bool _allow_optimizations; // Allow loop optimizations > 671: > 672: IdealLoopTree( PhaseIdealLoop* phase, Node *head, Node *tail ); Suggestion: IdealLoopTree(PhaseIdealLoop* phase, Node* head, Node* tail); src/hotspot/share/opto/loopnode.hpp line 1217: > 1215: PhaseTransform(Ideal_Loop), > 1216: _arena(mtCompiler, Arena::Tag::tag_idealloop), > 1217: _loop_or_ctrl(&_arena), How about some of the other data structures? For example `_idom`? test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java line 32: > 30: * -XX:+UnlockDiagnosticVMOptions -XX:+StressLoopPeeling -XX:+VerifyLoopOptimizations > 31: * -XX:StressSeed=3106998670 TestVerifyLoopOptimizationsHighMemUsage > 32: * @run main TestVerifyLoopOptimizationsHighMemUsage Suggestion: * @run main ${test.main.class} Also: your test does not have a `package` declaration. ------------- PR Review: https://git.openjdk.org/jdk/pull/28581#pullrequestreview-3562480981 PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2606610438 PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2606618710 PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2606620987 From chagedorn at openjdk.org Wed Dec 10 13:25:39 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 Dec 2025 13:25:39 GMT Subject: RFR: 8373420: C2: Add true/false_proj*() methods for IfNode as a replacement for proj_out*(true/false) Message-ID: There are a lot of places in the code where we call `proj_out*(true/false)` on an `IfNode`. In some cases, we then cast the returned `ProjNode` back to `IfProjNode` or `IfTrueNode/IfFalseNode`. I often visit such code and now decided to clean this up. The patch proposes new `IfNode::true/false_proj*()` methods that return `IfTrueNode/IfFalseNode` directly. I walked through all `proj_out*()` calls and replaced those that used a direct `true/false` or `1/0` as argument. There are still more things to clean up in this area, for example, when we return `ProjNode` even though it should be an `IfProjNode` which requires more casting. But let's do that step by step in follow-up clean ups. Thanks, Christian ------------- Commit messages: - Fix after merge - Merge branch 'master' into JDK-8373420 - 8373420: C2: Add true/false_proj*() methods for IfNode as a replacement for proj_out*(true/false) Changes: https://git.openjdk.org/jdk/pull/28745/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28745&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373420 Stats: 66 lines in 11 files changed: 20 ins; 4 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/28745.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28745/head:pull/28745 PR: https://git.openjdk.org/jdk/pull/28745 From epeter at openjdk.org Wed Dec 10 13:27:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 13:27:30 GMT Subject: RFR: 8372302: C2: IGVN verification fails because ModXNode::Ideal creates unused intermediate nodes In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 10:35:36 GMT, Beno?t Maillard wrote: > This PR addresses a failure in IGVN verification with `ModI` and `ModL` nodes. > > In `ModeXNode::Ideal`, we have code to optimize a modulo expression by expressing it in terms of other operations. There are actually two distinct cases, one where the divisor is a constant and is equal to `modulo 2^k-1` for some integer `k`, and a more general case where other transformations do not succeed. Because these transformations involve creating several new nodes (sometimes in a loop) and calling `phase->transform(...)` on them, we want to avoid accidentally triggering optimizations on the "unfinished" state of the subgraph. For this, we create a temporary dummy node and add edges to the nodes being constructed. > > There are some execution paths where the node is not destroyed before `Ideal` returns, and this creates issues during IGVN verification, as the verification code checks if the number of nodes has changed after having called `Ideal` on a given node and not expecting changes. > > The path in question is when we exit because the divisor is a constant and is the minimum value: > https://github.com/openjdk/jdk/blob/c19b12927d2ac901ec8ccaa2de5897ee4c47af56/src/hotspot/share/opto/divnode.cpp#L1146-L1147 > > The zero case does not cause problems (this seems to be because it would hide behind a `div0_check` anyway). > > The fix is simply to only create the temporary node when it is needed, and thus avoiding returning without destroying it. > > ### Testing > - [x] [GitHub Actions](TODO) > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! @benoitmaillard Thanks for fixing this! Fix looks good, I just have a few nits below ;) test/hotspot/jtreg/compiler/c2/TestModIdealCreatesUselessNode.java line 24: > 22: */ > 23: > 24: package compiler.c2; Could we find some more specific igvn directory? test/hotspot/jtreg/compiler/c2/TestModIdealCreatesUselessNode.java line 39: > 37: * -XX:VerifyIterativeGVN=1110 > 38: * compiler.c2.TestModIdealCreatesUselessNode > 39: * @run main compiler.c2.TestModIdealCreatesUselessNode Suggestion: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions * -Xcomp -XX:-TieredCompilation * -XX:CompileCommand=compileonly,${test.main.class}::test* * -XX:VerifyIterativeGVN=1110 * ${test.main.class} * @run main ${test.main.class} You would have to verify if the template works also in the `CompileCommand`, but I think it should. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28488#pullrequestreview-3562528839 PR Review Comment: https://git.openjdk.org/jdk/pull/28488#discussion_r2606650901 PR Review Comment: https://git.openjdk.org/jdk/pull/28488#discussion_r2606646816 From fgao at openjdk.org Wed Dec 10 13:28:39 2025 From: fgao at openjdk.org (Fei Gao) Date: Wed, 10 Dec 2025 13:28:39 GMT Subject: RFR: 8307084: C2: Vectorized drain loop is not executed for some small trip counts [v3] In-Reply-To: References: <3upl3uiPM5gnO1HCV7vb1C7CFyV3HQ2ztGXVJkss-AM=.09da8cb2-e384-420a-91d1-f3bb8d8cfc6a@github.com> Message-ID: On Mon, 8 Dec 2025 15:31:24 GMT, Emanuel Peter wrote: > Sorry, I dropped the ball on this one. A lot going on with JDK26 and other larger PRs. > No worries. Thanks for taking the time to look at it! > Ah I see. You are indeed doing some special warmup here. That should be better documented. I wonder also if you want to make this a parameter, so we can see the performance with and without it? > Sure. I?ll update it with clearer comments and add a parameter in the next commit. > At some point I need to check out your patch and see what effect it has on the benchmarks I'm presenting here: #27315 > Thanks for benchmarking it! > Do you think it would really not be measurable for small sizes? If not, we would have to find other methods to make a difference for small iteration counts. I don?t think small-iteration loops are unmeasurable in terms of performance?it really depends on what scenarios we want to benchmark. If the goal is to measure how C2 performs on small fixed-iteration loops, then benchmarking microcases with a fixed small trip count makes sense. But if we want to measure how auto-vectorized C2 code behaves when executing small-iteration loops, then a special warm-up phase is needed to ensure vectorization actually occurs. I?m not sure I fully understood your question. Could you clarify which scenario you would like to benchmark for small iteration loops? > > > It may decide not to auto-vectorize, or even remove the loop entirely and keep only some scalar nodes. > > It could be worth creating some IR tests to see what exactly happens here. Yes, that makes sense. I expect the IR tests may vary across machines with different vector lengths. We could handle this in a separate PR, since it's to observe the behavior of the existing C2 implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22629#discussion_r2606649903 From mli at openjdk.org Wed Dec 10 13:35:25 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 10 Dec 2025 13:35:25 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v2] In-Reply-To: References: <974SIxPSx8nU0vFHAVxYyyw9UiTFtYWspmvNfOD1cOQ=.e3ce2b34-0e42-42c8-8a3c-49463203f21c@github.com> Message-ID: On Wed, 10 Dec 2025 13:03:12 GMT, Emanuel Peter wrote: >> No, it does not. But I did not investigate further in this pr. > > I suppose adding `aarch64` is already a step in the right direction. But even better would be if we could remove the `@requires` completely, and just add restrictions to the `IR` rules. The first version of this pr did not have this `@requires`, but CI shows that IR verification failed, so I add this `@requires` for both riscv and aarch64 (check https://github.com/openjdk/jdk/pull/28702/commits/7848a1bf263ad9f498287f7b243601731b67b269) Yes, it's good to have it run on x86, but I think it's worth anothe pr to investigate it. How do you think about it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28702#discussion_r2606680365 From rcastanedalo at openjdk.org Wed Dec 10 13:55:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 10 Dec 2025 13:55:55 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: <_oUy5ZPqiqz05sYchjgUEtf_L4I077g3XKK0o8DoF8Q=.565b5e4c-eeea-475b-8d53-69d564b92a15@github.com> References: <_oUy5ZPqiqz05sYchjgUEtf_L4I077g3XKK0o8DoF8Q=.565b5e4c-eeea-475b-8d53-69d564b92a15@github.com> Message-ID: On Tue, 9 Dec 2025 14:05:57 GMT, Roberto Casta?eda Lozano wrote: > I just started some internal testing, will come back with results in a day or two, and hopefully also start reviewing this soon. Test results look good except for trivial failures in product runs of the new tests, due to missing `-XX:+UnlockDiagnosticVMOptions` for the stress options. > Right, this is what I propose to fix the present issue and it seems cleaner to me (we let `Identity` handle the identity transformations). I doubt there'll be a measurable compilation time difference. I don't have a strong opinion though, so we can go with what you propose as well. Let's see what other reviewers think before we make a decision! Like Daniel and Damon, I also have a slight preference towards enqueuing the node and letting `PhiNode::Identity` perform the change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28677#issuecomment-3637196898 From epeter at openjdk.org Wed Dec 10 14:04:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 14:04:52 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v2] In-Reply-To: References: <974SIxPSx8nU0vFHAVxYyyw9UiTFtYWspmvNfOD1cOQ=.e3ce2b34-0e42-42c8-8a3c-49463203f21c@github.com> Message-ID: On Wed, 10 Dec 2025 13:32:42 GMT, Hamlin Li wrote: >> I suppose adding `aarch64` is already a step in the right direction. But even better would be if we could remove the `@requires` completely, and just add restrictions to the `IR` rules. > > The first version of this pr did not have this `@requires`, but CI shows that IR verification failed, so I add this `@requires` for both riscv and aarch64 (check https://github.com/openjdk/jdk/pull/28702/commits/7848a1bf263ad9f498287f7b243601731b67b269) > Yes, it's good to have it run on x86, but I think it's worth anothe pr to investigate it. How do you think about it? I would remove the `@requires`, but add `applyIf` for the CPU features to the IR rules. Then we can still add more CPU features in a follow-up RFE, I don't want to force you to do more there ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28702#discussion_r2606784829 From mli at openjdk.org Wed Dec 10 14:51:21 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 10 Dec 2025 14:51:21 GMT Subject: RFR: 8373428: Refine variables with the same name in nested scopes in PhaseChaitin::gather_lrg_masks Message-ID: <8WWg7y_W2PGKAkwrVUfN97dBZ56I2MRvbMuxowqmnZE=.4c238198-0b07-47da-8756-1485846f044f@github.com> Hi, Can you help to review this trivial patch? In PhaseChaitin::gather_lrg_masks, several variables have the same name in nested scopes, it looks like following code snippet. { A a; { A a; } } This is not helpful to code readability, in particular in a long method like `gather_lrg_masks`, better to rename them. Thanks! ------------- Commit messages: - more - initial commit Changes: https://git.openjdk.org/jdk/pull/28748/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28748&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373428 Stats: 20 lines in 1 file changed: 0 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/28748.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28748/head:pull/28748 PR: https://git.openjdk.org/jdk/pull/28748 From mli at openjdk.org Wed Dec 10 14:59:22 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 10 Dec 2025 14:59:22 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v2] In-Reply-To: References: <974SIxPSx8nU0vFHAVxYyyw9UiTFtYWspmvNfOD1cOQ=.e3ce2b34-0e42-42c8-8a3c-49463203f21c@github.com> Message-ID: <6hrIWxpL339En38vZlHxp3AhNuYd3zJQ51vw1-TYJug=.0e252d08-c27a-4b86-bfbe-9eb47ba379ce@github.com> On Wed, 10 Dec 2025 14:02:20 GMT, Emanuel Peter wrote: >> The first version of this pr did not have this `@requires`, but CI shows that IR verification failed, so I add this `@requires` for both riscv and aarch64 (check https://github.com/openjdk/jdk/pull/28702/commits/7848a1bf263ad9f498287f7b243601731b67b269) >> Yes, it's good to have it run on x86, but I think it's worth anothe pr to investigate it. How do you think about it? > > I would remove the `@requires`, but add `applyIf` for the CPU features to the IR rules. Then we can still add more CPU features in a follow-up RFE, I don't want to force you to do more there ? OK, use the `applyIf` instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28702#discussion_r2607005741 From mli at openjdk.org Wed Dec 10 14:59:20 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 10 Dec 2025 14:59:20 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > > [JDK-8357551](https://bugs.openjdk.org/browse/JDK-8357551) add support of CMoveF/D vectorization, at the same time it also adds some tests for scalar CMove on riscv. > It's good to enable these tests on other platforms, like x86/aarch64 or maybe others. > > At the same time, this pr also move these tests under `compiler/c2/cmove`, as suggested here https://github.com/openjdk/jdk/pull/28309#discussion_r2598664764. > > Thanks! > > ## Test > In progress... (I'm using github CI to run the tests.) Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: applyIf ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28702/files - new: https://git.openjdk.org/jdk/pull/28702/files/7848a1bf..48c99a81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28702&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28702&range=01-02 Stats: 145 lines in 1 file changed: 72 ins; 1 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/28702.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28702/head:pull/28702 PR: https://git.openjdk.org/jdk/pull/28702 From fyang at openjdk.org Wed Dec 10 15:29:09 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 10 Dec 2025 15:29:09 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v3] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 14:59:20 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> [JDK-8357551](https://bugs.openjdk.org/browse/JDK-8357551) add support of CMoveF/D vectorization, at the same time it also adds some tests for scalar CMove on riscv. >> It's good to enable these tests on other platforms, like x86/aarch64 or maybe others. >> >> At the same time, this pr also move these tests under `compiler/c2/cmove`, as suggested here https://github.com/openjdk/jdk/pull/28309#discussion_r2598664764. >> >> Thanks! >> >> ## Test >> In progress... (I'm using github CI to run the tests.) > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > applyIf Latest version LGTM. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28702#pullrequestreview-3563128511 From liach at openjdk.org Wed Dec 10 15:39:53 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 10 Dec 2025 15:39:53 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v3] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 13:07:20 GMT, Emanuel Peter wrote: >> Chen Liang has updated the pull request incrementally with one additional commit since the last revision: >> >> Typo > > test/hotspot/jtreg/compiler/c2/irTests/constantFold/IdentityHashCodeFold.java line 37: > >> 35: * @library /test/lib / >> 36: * @requires vm.compiler2.enabled >> 37: * @run driver compiler.c2.irTests.constantFold.IdentityHashCodeFold > > Suggestion: > > * @run driver ${test.main.class} > > Is the C2 requirement really necessary? The C2 requirement is effective if a build configuration disables the compiler2 feature. I don't know if we run tests in such a build. I copied this from `compiler/c2/irTests/TestEnumFinalFold.java` in particular. > test/hotspot/jtreg/compiler/c2/irTests/constantFold/IdentityHashCodeFold.java line 51: > >> 49: public int testSum() { >> 50: return a.hashCode() + System.identityHashCode(b); >> 51: } > > This does not test correctness of the result. How confident are we that this patch is sufficiently tested? > How can we test that the compiled and interpreter hashcode are equivalent? I can't find a way to access the identity hash code without compilation. Would something like a method that calls System.identityHashCode but is not inlied work? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2607156541 PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2607151871 From liach at openjdk.org Wed Dec 10 15:41:05 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 10 Dec 2025 15:41:05 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v9] In-Reply-To: References: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> Message-ID: On Wed, 10 Dec 2025 07:32:59 GMT, Emanuel Peter wrote: >> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Review >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache >> - Bugs and verify loader leak >> - Try to avoid loader leak >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache >> - Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure >> - Test from Jorn >> - Copyright years >> - Fix problem identified by Jorn >> - Rollback getAndAdd for now >> - ... and 7 more: https://git.openjdk.org/jdk/compare/bdc0543f...d734e8a6 > > test/hotspot/jtreg/compiler/c2/irTests/constantFold/VarHandleMismatchedTypeFold.java line 2: > >> 1: /* >> 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. > > We want to eventually migrate all tests from `c2/irTests` to more "topic based" directories, so it would be great if you already migrated them now, since you probably would know a better name/place now already ;) What would be a good directory? I used `c2/irTests/constantFold` as a topic. Maybe you can find a more straightforward directory for constant folding verification. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2607162521 From dfenacci at openjdk.org Wed Dec 10 15:56:23 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 10 Dec 2025 15:56:23 GMT Subject: RFR: 8373420: C2: Add true/false_proj*() methods for IfNode as a replacement for proj_out*(true/false) In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 13:13:44 GMT, Christian Hagedorn wrote: > There are a lot of places in the code where we call `proj_out*(true/false)` on an `IfNode`. In some cases, we then cast the returned `ProjNode` back to `IfProjNode` or `IfTrueNode/IfFalseNode`. I often visit such code and now decided to clean this up. > > The patch proposes new `IfNode::true/false_proj*()` methods that return `IfTrueNode/IfFalseNode` directly. I walked through all `proj_out*()` calls and replaced those that used a direct `true/false` or `1/0` as argument. > > There are still more things to clean up in this area, for example, when we return `ProjNode` even though it should be an `IfProjNode` which requires more casting. But let's do that step by step in follow-up clean ups. > > Thanks, > Christian Nice cleanup. Thanks @chhagedorn. Looks good to me. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/28745#pullrequestreview-3563263362 From fyang at openjdk.org Wed Dec 10 16:12:28 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 10 Dec 2025 16:12:28 GMT Subject: RFR: 8373069: RISC-V: implement GHASH intrinsic In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 03:54:29 GMT, Anjian Wen wrote: > support GHASH intrinsic for crypt GCM, which need zvkg extension. > > passed the tests in > test/hotspot/jtreg/compiler/codegen/aes/ > test/jdk/com/sun/crypto src/hotspot/cpu/riscv/assembler_riscv.hpp line 1989: > 1987: > 1988: // Vector GHASH (Zvkg) Extension > 1989: INSN(vgmul_vv, 0b1110111, 0b010, 0b10001, 0b1, 0b101000); Seems not used anywhere? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2857: > 2855: VectorRegister partial_hash = v29; > 2856: VectorRegister hash_subkey = v30; > 2857: VectorRegister cipher_text = v31; Can we simply start from `v1` here? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 7109: > 7107: } > 7108: > 7109: if (UseGHASHIntrinsics && UseZvbb) { Do we need to re-check `UseZvbb` here? `UseGHASHIntrinsics` will be disabled if we don't have `UseZvbb`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28548#discussion_r2607290431 PR Review Comment: https://git.openjdk.org/jdk/pull/28548#discussion_r2607281675 PR Review Comment: https://git.openjdk.org/jdk/pull/28548#discussion_r2607273308 From hgreule at openjdk.org Wed Dec 10 16:13:42 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Wed, 10 Dec 2025 16:13:42 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v9] In-Reply-To: References: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> Message-ID: On Wed, 10 Dec 2025 15:38:12 GMT, Chen Liang wrote: >> test/hotspot/jtreg/compiler/c2/irTests/constantFold/VarHandleMismatchedTypeFold.java line 2: >> >>> 1: /* >>> 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. >> >> We want to eventually migrate all tests from `c2/irTests` to more "topic based" directories, so it would be great if you already migrated them now, since you probably would know a better name/place now already ;) > > What would be a good directory? I used `c2/irTests/constantFold` as a topic. Maybe you can find a more straightforward directory for constant folding verification. Maybe something like `c2/methodhandles` would be a good fit? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2607297537 From liach at openjdk.org Wed Dec 10 16:38:21 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 10 Dec 2025 16:38:21 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v9] In-Reply-To: References: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> Message-ID: <7XWutFIFgBWASCLSHH0XdEurVGgfX14HwelxQttnbr8=.9d92b47a-3313-46c1-822d-79a55cfb7d04@github.com> On Wed, 10 Dec 2025 16:11:07 GMT, Hannes Greule wrote: >> What would be a good directory? I used `c2/irTests/constantFold` as a topic. Maybe you can find a more straightforward directory for constant folding verification. > > Maybe something like `c2/methodhandles` would be a good fit? There is `compiler/jsr292` directory, that might work I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2607392582 From epeter at openjdk.org Wed Dec 10 17:21:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 17:21:05 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v3] In-Reply-To: References: Message-ID: <6ip4JrJ4WiYEe6d2FA_WQ5dDjxAk2RPaPbwth4jNeJM=.43d7879d-89a4-434c-80ea-371c92581686@github.com> On Wed, 10 Dec 2025 15:36:36 GMT, Chen Liang wrote: >> test/hotspot/jtreg/compiler/c2/irTests/constantFold/IdentityHashCodeFold.java line 37: >> >>> 35: * @library /test/lib / >>> 36: * @requires vm.compiler2.enabled >>> 37: * @run driver compiler.c2.irTests.constantFold.IdentityHashCodeFold >> >> Suggestion: >> >> * @run driver ${test.main.class} >> >> Is the C2 requirement really necessary? > > The C2 requirement is effective if a build configuration disables the compiler2 feature. I don't know if we run tests in such a build. I copied this from `compiler/c2/irTests/TestEnumFinalFold.java` in particular. You can for example run it with C1 or Xint only. That would disable the test. Or someone runs it with Graal. I would generally remove `@requires` from any test we can, to get more coverage. >> test/hotspot/jtreg/compiler/c2/irTests/constantFold/IdentityHashCodeFold.java line 51: >> >>> 49: public int testSum() { >>> 50: return a.hashCode() + System.identityHashCode(b); >>> 51: } >> >> This does not test correctness of the result. How confident are we that this patch is sufficiently tested? >> How can we test that the compiled and interpreter hashcode are equivalent? > > I can't find a way to access the identity hash code without compilation. Would something like a method that calls System.identityHashCode but is not inlied work? You could compute the result in the static initializer, it should therefore be computed in the interpreter. And then add a `@Check` method to compare the `testSum` value from the compiler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2607542060 PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2607538782 From epeter at openjdk.org Wed Dec 10 17:50:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Dec 2025 17:50:03 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account [v2] In-Reply-To: References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> Message-ID: On Wed, 3 Dec 2025 13:03:02 GMT, Emanuel Peter wrote: >> **Summary** >> >> I created some `fill` and `copy` style benchmarks, covering both `arrays` and `MemorySegment`s. >> Reasons for this benchmark: >> - I want to compare auto-vectorization with intrinsics (array assembly style intrinsics, and MemorySegment java level special implementations). This allows us to see if some are slower than others, and if we can manage to improve the slower versions somehow in the future. >> - There are some known issues we can demonstrate well with this benchmark: >> - Super-Unrolling: unrolling the vectoirzed loop gets us extra performance, but the exact factor may not be optimal yet for auto-vectorization. >> - Small iteration count loops: auto-vectorization can lead to slowdowns. >> - Many benchmarks do not control for alignment. But that creates noise. I just go over all possible alignments, that should smooth out the noise. >> - Most benchmarks do not control for 4k aliasing (x86 effect in store buffer). I make sure that load/stores are not a multiple of 4k bytes apart, so we can avoid the noise of that effect. >> >> ---------------------------------------------------------------------- >> >> **Analysis based on this Benchmark** >> >> Analysis done in this PR: >> - Arrays: auto vectorization vs scalar loops performance >> - Arrays: auto vectorization loops vs intrinsics >> - MemorySegments: auto vectorization loops vs scalar loops vs `MemorySegment.fill/copy` >> >> Future work: >> - Investigate deeper, inspect assembly, etc. >> - Impact of `-XX:SuperWordAutomaticAlignment=0` on small iteration count loops. >> - Investigate effect of `-XX:-OptimizeFill`. It seems that the loops in this benchmark are not detected automatically, and so the array intrinsics are not used. Why? >> - Investigate impact of `CompactObjectHeaders`. Does enabling/disabling change any performance? >> - Investigate if adjusting the super-unrolling factor could improve performance for auto-vectorization: [JDK-8368061](https://bugs.openjdk.org/browse/JDK-8368061) >> - Performance comparison with Graal. >> >> ---------------------------------------------------------------------- >> >> **Array Benchmark: auto vectorization vs scalar** >> >> We can see that for arrays, auto vectorization leads to minor regressions for sizes 1-32, and then generally auto vectorization is faster for larger sizes. And this is true for both `fill` and `copy`. >> >> Strange: `macosx_aarch64` with `copy_int`. The auto vectoirized performance has a sudden drop around 150 iterations. Also for `fill_... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - small modulo fix from review suggestion > - Merge branch 'master' into JDK-8367158-fill-and-copy-benchmarks > - more MS types > - fix MS fill > - more backing types > - object array benchmarks > - fix bm > - ms bm update > - clean up benchmark > - more types > - ... and 6 more: https://git.openjdk.org/jdk/compare/cf14be35...80378aea Can someone please review this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27315#issuecomment-3636673353 From vlivanov at openjdk.org Wed Dec 10 19:49:07 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 10 Dec 2025 19:49:07 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v3] In-Reply-To: <6ip4JrJ4WiYEe6d2FA_WQ5dDjxAk2RPaPbwth4jNeJM=.43d7879d-89a4-434c-80ea-371c92581686@github.com> References: <6ip4JrJ4WiYEe6d2FA_WQ5dDjxAk2RPaPbwth4jNeJM=.43d7879d-89a4-434c-80ea-371c92581686@github.com> Message-ID: On Wed, 10 Dec 2025 17:18:15 GMT, Emanuel Peter wrote: >> The C2 requirement is effective if a build configuration disables the compiler2 feature. I don't know if we run tests in such a build. I copied this from `compiler/c2/irTests/TestEnumFinalFold.java` in particular. > > You can for example run it with C1 or Xint only. That would disable the test. Or someone runs it with Graal. > I would generally remove `@requires` from any test we can, to get more coverage. It's a test on C2 IR. What's the point in running it w/o C2? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2608004982 From vlivanov at openjdk.org Wed Dec 10 19:51:06 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 10 Dec 2025 19:51:06 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v9] In-Reply-To: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> References: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> Message-ID: On Mon, 8 Dec 2025 19:10:48 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - Review > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Bugs and verify loader leak > - Try to avoid loader leak > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure > - Test from Jorn > - Copyright years > - Fix problem identified by Jorn > - Rollback getAndAdd for now > - ... and 7 more: https://git.openjdk.org/jdk/compare/8da6ec63...d734e8a6 make/jdk/src/classes/build/tools/methodhandle/VarHandleGuardMethodGenerator.java line 132: > 130: // TestZGCBarrierElision.testAtomicThenAtomicAnotherField fails > 131: // However, testArrayAtomicThenAtomic, testAtomicThenAtomic, and > 132: // testArrayAtomicThenAtomicAtUnknownIndices works It doesn't look right. Was is the root cause of the failure? Can it be a test bug? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2608010297 From vlivanov at openjdk.org Wed Dec 10 19:57:03 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 10 Dec 2025 19:57:03 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v9] In-Reply-To: <8N5F01Ve6BLQjlsJ7nD53xn70wa_NrRMqeuSb1Grp6M=.f45dd3b4-b282-41a8-a245-b7673bf65261@github.com> References: <8N5F01Ve6BLQjlsJ7nD53xn70wa_NrRMqeuSb1Grp6M=.f45dd3b4-b282-41a8-a245-b7673bf65261@github.com> Message-ID: On Wed, 10 Dec 2025 08:33:30 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - More comments > - Tighten up the comments > - Simplify third case: no need to loop, just restart the search > - Actually have a second "fast" case: receiver is not found in the table, and the table is full > - Pushing/popping for rare CAS path is counter-productive > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Tighten up some more > - Offset is always rscratch1, no need to save it > - ... and 12 more: https://git.openjdk.org/jdk/compare/1bbbce75...c28810e3 Looks good. Thanks for the clarifications, Aleksey. Just wanted to get a sense how much performance-wise we leave on the table and whether it is worth to spend more time on it later. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25305#pullrequestreview-3564254380 PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3638717818 From kvn at openjdk.org Wed Dec 10 20:11:42 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 10 Dec 2025 20:11:42 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v9] In-Reply-To: <8N5F01Ve6BLQjlsJ7nD53xn70wa_NrRMqeuSb1Grp6M=.f45dd3b4-b282-41a8-a245-b7673bf65261@github.com> References: <8N5F01Ve6BLQjlsJ7nD53xn70wa_NrRMqeuSb1Grp6M=.f45dd3b4-b282-41a8-a245-b7673bf65261@github.com> Message-ID: On Wed, 10 Dec 2025 08:33:30 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - More comments > - Tighten up the comments > - Simplify third case: no need to loop, just restart the search > - Actually have a second "fast" case: receiver is not found in the table, and the table is full > - Pushing/popping for rare CAS path is counter-productive > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Tighten up some more > - Offset is always rscratch1, no need to save it > - ... and 12 more: https://git.openjdk.org/jdk/compare/1bbbce75...c28810e3 Yes, we can look on this later if we need to optimize it more. Thankfully it is in one place now. I don't need to retest it since you didn't change code after v07 and only merged from mainline. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25305#pullrequestreview-3564300007 From mli at openjdk.org Wed Dec 10 20:55:58 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 10 Dec 2025 20:55:58 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v3] In-Reply-To: References: Message-ID: <2UcIdFe_ACoawW4xHq18v91QzpPRwiK--U4dL4KVA9Y=.da88587a-d00b-47d7-a6ff-307ebb666556@github.com> On Wed, 10 Dec 2025 15:26:38 GMT, Fei Yang wrote: > Latest version LGTM. @RealFYang Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28702#issuecomment-3638921538 From mli at openjdk.org Wed Dec 10 21:03:30 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 10 Dec 2025 21:03:30 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) [v5] In-Reply-To: References: Message-ID: <8UNRuvPlqC7XfrAGCThuwc7RGE2q5RFlRg9LavNfTrM=.538e9b99-0256-47f9-b784-5053811aa8a0@github.com> > Hi, > > Can you help to review this patch? > > This patch enables the vectorization of statement like `op_1 bop op_2 ? res_f_d_1 : res_f_d_2` in a loop, where op_x's size is different from res_f_d_x's. > > To assist with code review, this pr contains only the shared code change, is splitted from https://github.com/openjdk/jdk/pull/28230, which enable & implement the riscv part. The similar optimization could be extended to other platforms. > > ## Some background > > Previously, it's https://github.com/openjdk/jdk/pull/25336, which was blocked by unsigned comparison issue. The issue was recently resolved by https://github.com/openjdk/jdk/pull/27942, so I'm re-start working on this optimization. > > This pr only relaxes one of the constraints in https://github.com/openjdk/jdk/pull/25336, i.e. transform CMoveF/D to vector operations no matter what's the size of comparison's operator, but remove the optimization of transform CMoveI/L to vector operations which I think need more investigation. > > # Test > ## Jtreg > > in progress... > > ## Performance > > check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Thanks Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 38 additional commits since the last revision: - enable riscv - tests - review comment - Merge branch 'master' into vectorize-CMove-Bool - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'master' into vectorize-CMove-Bool - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - fix typo - ... and 28 more: https://git.openjdk.org/jdk/compare/ff004e25...ecc84adc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28231/files - new: https://git.openjdk.org/jdk/pull/28231/files/8e84017f..ecc84adc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28231&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28231&range=03-04 Stats: 121854 lines in 1871 files changed: 79162 ins; 31077 del; 11615 mod Patch: https://git.openjdk.org/jdk/pull/28231.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28231/head:pull/28231 PR: https://git.openjdk.org/jdk/pull/28231 From duke at openjdk.org Wed Dec 10 21:19:42 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 10 Dec 2025 21:19:42 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache Message-ID: ### Summary This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. ### Description The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. ### Performance Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. ### Testing * CodeCache tests have been updated to cover the new `HotCodeHeap`. * Dedicated tests for the `HotCodeGrouper` will be added in follow-up work. ### Logging * New logging: `-Xlog:hotcodegrouper`. ------------- Commit messages: - Update blob checks - Merge fix - Merge remote-tracking branch 'origin/master' into JDK-8326205 - Clean up - New implementation - Mark nmethods seen by profiler as maybe on stack - Rename option HotCodeGrouper to HotCodeHeap due to name conflict - Dont count nmethods from HotCodeHeap as new - Resize _samples if needed; Profile till steady nmethod count - Use existing Platform.java functions to determine if C2 is available - ... and 12 more: https://git.openjdk.org/jdk/compare/1ae4a6c4...c4c779a5 Changes: https://git.openjdk.org/jdk/pull/27858/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8326205 Stats: 1284 lines in 40 files changed: 1198 ins; 18 del; 68 mod Patch: https://git.openjdk.org/jdk/pull/27858.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27858/head:pull/27858 PR: https://git.openjdk.org/jdk/pull/27858 From kvn at openjdk.org Wed Dec 10 21:19:44 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 10 Dec 2025 21:19:44 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 23:06:35 GMT, Chad Rakoczy wrote: > ### Summary > This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. > > ### Description > The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. > > Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. > > The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. > > The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). > > Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. > > ### Performance > Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. > > ### Testing > * CodeCache tests have been updated to cover the new `HotCodeHeap`. > * Dedicated tests for the `HotCodeGrouper` will be ... Consider making this feature C2 specific since you require presence of C2. Move flags into `opto/c2_globals.hpp` or `compiler/compiler_globals.hpp` and add `#ifdef COMPILER2` around new code. Especially new `hotCodeGrouper.*` files. We don't need this code in VM which is built without C2. You would still need to check `is_c2_enabled()` because of `StopAtTierLevel=n` flag. src/hotspot/share/compiler/compilerDefinitions.cpp line 346: > 344: vm_exit_during_initialization("HotCodeHeap requires C2 enabled", NULL); > 345: } > 346: } Put it under `#ifdef COMPILER2` ------------- PR Review: https://git.openjdk.org/jdk/pull/27858#pullrequestreview-3351380445 PR Review Comment: https://git.openjdk.org/jdk/pull/27858#discussion_r2440788365 From duke at openjdk.org Wed Dec 10 21:19:45 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 10 Dec 2025 21:19:45 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache In-Reply-To: References: Message-ID: <5Yq39yXUeGDG7OjkvbvP1wcFb6Qz4AaIOOhmR8XlGMY=.5a60d563-073b-4bd6-b5c2-262bcbe8bee6@github.com> On Fri, 17 Oct 2025 18:10:16 GMT, Vladimir Kozlov wrote: >> ### Summary >> This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. >> >> ### Description >> The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. >> >> Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. >> >> The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. >> >> The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). >> >> Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. >> >> ### Performance >> Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. >> >> ### Testing >> * CodeCache tests have been updated to cover the new `HotCodeHeap`. >> * Dedicated... > > src/hotspot/share/compiler/compilerDefinitions.cpp line 346: > >> 344: vm_exit_during_initialization("HotCodeHeap requires C2 enabled", NULL); >> 345: } >> 346: } > > Put it under `#ifdef COMPILER2` I moved most of the new code under `#ifdef COMPILER2` and into `opto/c2_globals.hpp` I'm unsure what to do about `HotCodeHeapSize`. I can also make that flag C2 only which requires some updates to `codeCache.cpp` and overall I don't think it makes the code too messy. The biggest problem with that is what to do about `CodeBlobType` with the addition of the `MethodHot` type Before: enum class CodeBlobType { MethodNonProfiled = 0, // Execution level 1 and 4 (non-profiled) nmethods (including native nmethods) MethodProfiled = 1, // Execution level 2 and 3 (profiled) nmethods NonNMethod = 2, // Non-nmethods like Buffers, Adapters and Runtime Stubs All = 3, // All types (No code cache segmentation) NumTypes = 4 // Number of CodeBlobTypes }; After: enum class CodeBlobType { MethodNonProfiled = 0, // Execution level 1 and 4 (non-profiled) nmethods (including native nmethods) MethodProfiled = 1, // Execution level 2 and 3 (profiled) nmethods MethodHot = 2, // Nmethods predicted to be always hot NonNMethod = 3, // Non-nmethods like Buffers, Adapters and Runtime Stubs All = 4, // All types (No code cache segmentation) NumTypes = 5 // Number of CodeBlobTypes }; If we do #ifdef COMPILER2 /* AFTER */ #else /* BEFORE */ #endif Things start getting weird if blob types have different values in different builds. Particuatlly in tests such as [BlobType.java](https://github.com/openjdk/jdk/blob/master/test/lib/jdk/test/whitebox/code/BlobType.java). I think that we need to leave `CodeBlobType::MethodHot` regardless if C2 is present or not. In terms of `HotCodeHeapSize` I'm not sure the best place for it. I think it makes sense to leave with the other code heap flags which also allows us to print an error if specified without C2. On the other hand if it requires C2 is should be in `opto/c2_globals.hpp` @vnkozlov @eastig What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27858#discussion_r2475158385 From vlivanov at openjdk.org Wed Dec 10 21:27:22 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 10 Dec 2025 21:27:22 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v6] In-Reply-To: References: Message-ID: > Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. > > There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. > > The difference can be illustrated with the following simple cases: > > class A { void m() {} } > class B extends A { void m() {} } > > void testInstanceOf(A obj) { > if (obj instanceof B) { > obj.m(); > } > } > > InstanceOf::testInstanceOf (12 bytes) > @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call > > vs > > void testInstanceOfCast(A obj) { > if (obj instanceof B) { > B b = (B)obj; > b.m(); > } > } > > InstanceOf::testInstanceOfCast (17 bytes) > @ 13 InstanceOf$B::m (1 bytes) inline (hot) > > > Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. > > FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. > > Testing: hs-tier1 - hs-tier5 Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: - Improve the test - Improve comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28517/files - new: https://git.openjdk.org/jdk/pull/28517/files/4e9f4624..cff165a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=04-05 Stats: 85 lines in 2 files changed: 62 ins; 10 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/28517.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28517/head:pull/28517 PR: https://git.openjdk.org/jdk/pull/28517 From vlivanov at openjdk.org Wed Dec 10 21:27:24 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 10 Dec 2025 21:27:24 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v5] In-Reply-To: References: Message-ID: <32Aget4YkqVnhnaYdpT8bptd0FlgFcaHHUvnBjoBr2g=.fda3277d-4535-4513-a7ed-1af82ea0276f@github.com> On Tue, 9 Dec 2025 01:25:40 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Revert Compile::should_delay_inlining unification Updated comments and improved the test (improved robustness and added new test cases with nulls). Please, re-review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28517#issuecomment-3639002846 From vlivanov at openjdk.org Wed Dec 10 21:27:24 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 10 Dec 2025 21:27:24 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v6] In-Reply-To: <7tncm6HgyrCXyN7VAYAoo4e0igls2GofazYW-4PyzMg=.ce6e21b1-b2ac-4482-b661-b69cb3aa22f7@github.com> References: <4abJMXdHzqKGqU58EXHaXO7849B0a64NoShEvU110I4=.87a93e5c-b73e-4c6e-b85b-8797eea8814d@github.com> <7tncm6HgyrCXyN7VAYAoo4e0igls2GofazYW-4PyzMg=.ce6e21b1-b2ac-4482-b661-b69cb3aa22f7@github.com> Message-ID: On Sat, 6 Dec 2025 01:12:53 GMT, Vladimir Ivanov wrote: >> There are multiple ways without having to have yet another higher-level representation. The first one is that since `SubTypeCheck` does not accept `null` now, we can just choose one result for `null`. Choosing the `instanceof` approach may be a little more desirable, as it removes the need to perform this complicated match, and for `checkcast` we can manually insert a `CheckCastPP` anyway. Another solution is to have another input to `SubTypeCheck` which gives the result when the `obj` is `null`. On a whim, I kind of like this, as we can match both the `checkcast` and the `instanceof` pattern here, it also simplifies `GraphKit::gen_checkcast`, as we do not have to worry about "the cast that always succeeds will leave behind a null check". >> >> Just a suggestion, though. This PR is fine as it is to me. > > I agree it can be implemented without introducing new fancy IR nodes. The open question to me though is whether we can live without materializing null check until `SubTypeCheck` nodes are macro expanded. Otherwise, it'll turn into a gradual lowering through different representations. BTW null checks which have to be merged back break the matching of the IR shape. Something to improve as a follow-up change. Added new test cases to track it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2608248637 From eastigeevich at openjdk.org Wed Dec 10 21:41:48 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 10 Dec 2025 21:41:48 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache In-Reply-To: <5Yq39yXUeGDG7OjkvbvP1wcFb6Qz4AaIOOhmR8XlGMY=.5a60d563-073b-4bd6-b5c2-262bcbe8bee6@github.com> References: <5Yq39yXUeGDG7OjkvbvP1wcFb6Qz4AaIOOhmR8XlGMY=.5a60d563-073b-4bd6-b5c2-262bcbe8bee6@github.com> Message-ID: On Wed, 29 Oct 2025 19:51:42 GMT, Chad Rakoczy wrote: > I think that we need to leave `CodeBlobType::MethodHot` regardless if C2 is present or not. I agree. Another option might be to create a heap of `MethodNonProfiled` type which is hidden from `CodeCache::allocate`. `nmethod::relocate(CodeBlobType)` will be changed to `nmethod::relocate(CodeHeap*)`. This might simplify things, for example with putting `HotCodeHeapSize` in `opto/c2_globals.hpp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27858#discussion_r2608299316 From liach at openjdk.org Wed Dec 10 22:11:49 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 10 Dec 2025 22:11:49 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v9] In-Reply-To: References: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> Message-ID: On Wed, 10 Dec 2025 19:48:49 GMT, Vladimir Ivanov wrote: >> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Review >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache >> - Bugs and verify loader leak >> - Try to avoid loader leak >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache >> - Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure >> - Test from Jorn >> - Copyright years >> - Fix problem identified by Jorn >> - Rollback getAndAdd for now >> - ... and 7 more: https://git.openjdk.org/jdk/compare/f510a486...d734e8a6 > > make/jdk/src/classes/build/tools/methodhandle/VarHandleGuardMethodGenerator.java line 132: > >> 130: // TestZGCBarrierElision.testAtomicThenAtomicAnotherField fails >> 131: // However, testArrayAtomicThenAtomic, testAtomicThenAtomic, and >> 132: // testArrayAtomicThenAtomicAtUnknownIndices works > > It doesn't look right. What is the root cause of the failure? Can it be a test bug? I think that is when two different VarHandles are both invoked non-exactly in two call sites in one method, the 2nd one fails to be inlined, that the compare-and-exchange from the 2nd one is not present in the final IR. The deoptimization reason is either "unstable-if" or "too many null checks", I think I will try look into it in another effort. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2608390544 From vlivanov at openjdk.org Wed Dec 10 22:16:22 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 10 Dec 2025 22:16:22 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v9] In-Reply-To: References: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> Message-ID: On Wed, 10 Dec 2025 22:09:21 GMT, Chen Liang wrote: >> make/jdk/src/classes/build/tools/methodhandle/VarHandleGuardMethodGenerator.java line 132: >> >>> 130: // TestZGCBarrierElision.testAtomicThenAtomicAnotherField fails >>> 131: // However, testArrayAtomicThenAtomic, testAtomicThenAtomic, and >>> 132: // testArrayAtomicThenAtomicAtUnknownIndices works >> >> It doesn't look right. What is the root cause of the failure? Can it be a test bug? > > I think that is when two different VarHandles are both invoked non-exactly in two call sites in one method, the 2nd one fails to be inlined, that the compare-and-exchange from the 2nd one is not present in the final IR. The deoptimization reason is either "unstable-if" or "too many null checks", I think I will try look into it in another effort. If it's a test problem, then it's better to comment out the problematic test case instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2608398869 From vlivanov at openjdk.org Wed Dec 10 22:22:45 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 10 Dec 2025 22:22:45 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v5] In-Reply-To: <8uE-UIoLllpjPuICc7sjKwo2eEtbGPYcgFwDUtQ0QpM=.525a688d-2dfc-4dfb-9dd5-c8024d4bb74e@github.com> References: <8uE-UIoLllpjPuICc7sjKwo2eEtbGPYcgFwDUtQ0QpM=.525a688d-2dfc-4dfb-9dd5-c8024d4bb74e@github.com> Message-ID: <-_h21X5PWkjy5p_jC8nHr3sxeApZlHPEg3DuMUF89QI=.8c71347e-699a-4f56-a988-a36b68b6fe49@github.com> On Tue, 9 Dec 2025 10:04:02 GMT, Emanuel Peter wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert Compile::should_delay_inlining unification > > test/hotspot/jtreg/compiler/inlining/TestSubtypeCheckTypeInfo.java line 363: > >> 361: // Sample: >> 362: // 213 42 b compiler.inlining.TestSubtypeCheckTypeInfo::testIsInstanceCondLatePost (13 bytes) >> 363: static final Pattern TEST_CASE = Pattern.compile("^\\d+\\s+\\d+\\s+b\\s+" + TEST_CLASS_NAME + "::(\\w+) .*"); > > Drive by comment, no need to change things here now: > @iwanowww @chhagedorn Would it not be nice if we could do this kind of matching with the `TestFramework`? Instead of `IR` matching, just match the output of any compilation tracing / printing. Indeed, that would be a much better way. Also, `-XX:+LogCompilation` is a nice option since publishes all information in a structured way, but it would introduce a dependency on LogCompilation tool in the test library. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2608413144 From sparasa at openjdk.org Wed Dec 10 23:40:02 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 10 Dec 2025 23:40:02 GMT Subject: RFR: 8369020: fix test timeout issue for compiler/intrinsics/TestLongUnsignedDivMod.java by removing warmup counts Message-ID: The goal of this PR is to address the test timeout issue reported. This test currently has a warmup count of 10,000. By removing the mandatory warmup count, this test seeks to address the timeout issue. ------------- Commit messages: - 8369020: fix test timeout issue for compiler/intrinsics/TestLongUnsignedDivMod.java by removing warmup counts Changes: https://git.openjdk.org/jdk/pull/28756/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28756&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8369020 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28756.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28756/head:pull/28756 PR: https://git.openjdk.org/jdk/pull/28756 From dlong at openjdk.org Thu Dec 11 02:11:25 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 11 Dec 2025 02:11:25 GMT Subject: RFR: 8350208: CTW: GraphKit::add_safepoint_edges asserts "not enough operands for reexecution" In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:30:46 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the issue of the compiler crashing with "not enough operands for reexecution". The issue here is that during `Parse::catch_inline_exceptions`, the old stack is gone, and we cannot reexecute the current bytecode anymore. However, there are some places where we try to insert safepoints into the graph, such as if the handler is a backward jump, or if one of the exceptions in the handlers is not loaded. Since the `_reexecute` state of the current jvms is "undefined", it is inferred automatically that it should reexecute for some bytecodes such as `putfield`. The solution then is to explicitly set `_reexecute` to false. > > I can manage to write a unit test for the case of a backward handler, for the other cases, since the exceptions that can be thrown for a bytecode that is inferred to reexecute are `NullPointerException`, `ArrayIndexOutOfBoundsException`, and `ArrayStoreException`. I find it hard to construct such a test in which one of them is not loaded. > > Please kindly review, thanks a lot. @merykitty , I tried solution 1) and it seems to work, but I think I prefer solution 2) because it aligns better with my idea from JDK-8372846 of canonicalized exception states. If you like, I can take over this bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28597#issuecomment-3639722831 From erfang at openjdk.org Thu Dec 11 02:35:23 2025 From: erfang at openjdk.org (Eric Fang) Date: Thu, 11 Dec 2025 02:35:23 GMT Subject: [jdk26] RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 03:47:31 GMT, Xiaohong Gong wrote: > Hi all, > > This pull request contains a backport of commit [b6732d60](https://github.com/openjdk/jdk/commit/b6732d6048259de68a3dd5b4f66ac82f87270404) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaohong Gong on 10 Dec 2025 and was reviewed by Emanuel Peter, Eric Fang and Hao Sun. > > Thanks! Marked as reviewed by erfang (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/28732#pullrequestreview-3565356586 From wenanjian at openjdk.org Thu Dec 11 02:36:01 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 11 Dec 2025 02:36:01 GMT Subject: RFR: 8373069: RISC-V: implement GHASH intrinsic [v2] In-Reply-To: References: Message-ID: > support GHASH intrinsic for crypt GCM, which need zvkg extension. > > passed the tests in > test/hotspot/jtreg/compiler/codegen/aes/ > test/jdk/com/sun/crypto Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'openjdk:master' into ghash_intrinsic - Add some flag - RISC-V: implement GHASH intrinsic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28548/files - new: https://git.openjdk.org/jdk/pull/28548/files/a0392dda..b59700b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28548&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28548&range=00-01 Stats: 25244 lines in 213 files changed: 16836 ins; 7396 del; 1012 mod Patch: https://git.openjdk.org/jdk/pull/28548.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28548/head:pull/28548 PR: https://git.openjdk.org/jdk/pull/28548 From wenanjian at openjdk.org Thu Dec 11 02:46:09 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 11 Dec 2025 02:46:09 GMT Subject: RFR: 8373069: RISC-V: implement GHASH intrinsic [v3] In-Reply-To: References: Message-ID: <1CV70nG2aoOVmc0tRt0Nw-BnUXfjeqync-0RxuVfya0=.ff36c52a-b468-410e-a602-a472f125848c@github.com> > support GHASH intrinsic for crypt GCM, which need zvkg extension. > > passed the tests in > test/hotspot/jtreg/compiler/codegen/aes/ > test/jdk/com/sun/crypto Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: modify some format and change name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28548/files - new: https://git.openjdk.org/jdk/pull/28548/files/b59700b0..fc63cb77 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28548&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28548&range=01-02 Stats: 6 lines in 2 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28548.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28548/head:pull/28548 PR: https://git.openjdk.org/jdk/pull/28548 From wenanjian at openjdk.org Thu Dec 11 02:46:09 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 11 Dec 2025 02:46:09 GMT Subject: RFR: 8373069: RISC-V: implement GHASH intrinsic [v3] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 16:09:09 GMT, Fei Yang wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> modify some format and change name > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 1989: > >> 1987: >> 1988: // Vector GHASH (Zvkg) Extension >> 1989: INSN(vgmul_vv, 0b1110111, 0b010, 0b10001, 0b1, 0b101000); > > Seems not used anywhere? yes, I don't use it in this patch, it is one of the zvkg ins, maybe we can enable it later when we need it and I have move it. > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2857: > >> 2855: VectorRegister partial_hash = v29; >> 2856: VectorRegister hash_subkey = v30; >> 2857: VectorRegister cipher_text = v31; > > Can we simply start from `v1` here? sure, we can use v1 here, done > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 7109: > >> 7107: } >> 7108: >> 7109: if (UseGHASHIntrinsics && UseZvbb) { > > Do we need to re-check `UseZvbb` here? `UseGHASHIntrinsics` will be disabled if we don't have `UseZvbb`. yes, fixed it ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28548#discussion_r2608913077 PR Review Comment: https://git.openjdk.org/jdk/pull/28548#discussion_r2608909648 PR Review Comment: https://git.openjdk.org/jdk/pull/28548#discussion_r2608910567 From xgong at openjdk.org Thu Dec 11 02:56:37 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 11 Dec 2025 02:56:37 GMT Subject: [jdk26] RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 03:47:31 GMT, Xiaohong Gong wrote: > Hi all, > > This pull request contains a backport of commit [b6732d60](https://github.com/openjdk/jdk/commit/b6732d6048259de68a3dd5b4f66ac82f87270404) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaohong Gong on 10 Dec 2025 and was reviewed by Emanuel Peter, Eric Fang and Hao Sun. > > Thanks! Tested locally with different SVE vector length (128-bit, 256-bit, 512-bit), and all tests pass. Thanks for your review @eme64 and @erifan ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28732#issuecomment-3639824152 From xgong at openjdk.org Thu Dec 11 02:56:39 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 11 Dec 2025 02:56:39 GMT Subject: [jdk26] Integrated: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 03:47:31 GMT, Xiaohong Gong wrote: > Hi all, > > This pull request contains a backport of commit [b6732d60](https://github.com/openjdk/jdk/commit/b6732d6048259de68a3dd5b4f66ac82f87270404) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Xiaohong Gong on 10 Dec 2025 and was reviewed by Emanuel Peter, Eric Fang and Hao Sun. > > Thanks! This pull request has now been integrated. Changeset: b0ad3318 Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/b0ad3318092bd1a109612d3ef14ae057bd667c50 Stats: 638 lines in 8 files changed: 582 ins; 19 del; 37 mod 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE Reviewed-by: epeter, erfang Backport-of: b6732d6048259de68a3dd5b4f66ac82f87270404 ------------- PR: https://git.openjdk.org/jdk/pull/28732 From qamai at openjdk.org Thu Dec 11 03:33:25 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 11 Dec 2025 03:33:25 GMT Subject: RFR: 8350208: CTW: GraphKit::add_safepoint_edges asserts "not enough operands for reexecution" In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 02:09:08 GMT, Dean Long wrote: >> Hi, >> >> This PR fixes the issue of the compiler crashing with "not enough operands for reexecution". The issue here is that during `Parse::catch_inline_exceptions`, the old stack is gone, and we cannot reexecute the current bytecode anymore. However, there are some places where we try to insert safepoints into the graph, such as if the handler is a backward jump, or if one of the exceptions in the handlers is not loaded. Since the `_reexecute` state of the current jvms is "undefined", it is inferred automatically that it should reexecute for some bytecodes such as `putfield`. The solution then is to explicitly set `_reexecute` to false. >> >> I can manage to write a unit test for the case of a backward handler, for the other cases, since the exceptions that can be thrown for a bytecode that is inferred to reexecute are `NullPointerException`, `ArrayIndexOutOfBoundsException`, and `ArrayStoreException`. I find it hard to construct such a test in which one of them is not loaded. >> >> Please kindly review, thanks a lot. > > @merykitty , I tried solution 1) and it seems to work, but I think I prefer solution 2) because it aligns better with my idea from JDK-8372846 of canonicalized exception states. If you like, I can take over this bug. @dean-long Thanks a lot, please take over this bug then. > trim stack, throw exception (move to Thread) (reexecute=true) This requires extra unconditional overhead even though safepoint rarely happens. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28597#issuecomment-3639912845 From qamai at openjdk.org Thu Dec 11 05:40:30 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 11 Dec 2025 05:40:30 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account [v2] In-Reply-To: References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> Message-ID: On Wed, 3 Dec 2025 13:03:02 GMT, Emanuel Peter wrote: >> **Summary** >> >> I created some `fill` and `copy` style benchmarks, covering both `arrays` and `MemorySegment`s. >> Reasons for this benchmark: >> - I want to compare auto-vectorization with intrinsics (array assembly style intrinsics, and MemorySegment java level special implementations). This allows us to see if some are slower than others, and if we can manage to improve the slower versions somehow in the future. >> - There are some known issues we can demonstrate well with this benchmark: >> - Super-Unrolling: unrolling the vectoirzed loop gets us extra performance, but the exact factor may not be optimal yet for auto-vectorization. >> - Small iteration count loops: auto-vectorization can lead to slowdowns. >> - Many benchmarks do not control for alignment. But that creates noise. I just go over all possible alignments, that should smooth out the noise. >> - Most benchmarks do not control for 4k aliasing (x86 effect in store buffer). I make sure that load/stores are not a multiple of 4k bytes apart, so we can avoid the noise of that effect. >> >> ---------------------------------------------------------------------- >> >> **Analysis based on this Benchmark** >> >> Analysis done in this PR: >> - Arrays: auto vectorization vs scalar loops performance >> - Arrays: auto vectorization loops vs intrinsics >> - MemorySegments: auto vectorization loops vs scalar loops vs `MemorySegment.fill/copy` >> >> Future work: >> - Investigate deeper, inspect assembly, etc. >> - Impact of `-XX:SuperWordAutomaticAlignment=0` on small iteration count loops. >> - Investigate effect of `-XX:-OptimizeFill`. It seems that the loops in this benchmark are not detected automatically, and so the array intrinsics are not used. Why? >> - Investigate impact of `CompactObjectHeaders`. Does enabling/disabling change any performance? >> - Investigate if adjusting the super-unrolling factor could improve performance for auto-vectorization: [JDK-8368061](https://bugs.openjdk.org/browse/JDK-8368061) >> - Performance comparison with Graal. >> >> ---------------------------------------------------------------------- >> >> **Array Benchmark: auto vectorization vs scalar** >> >> We can see that for arrays, auto vectorization leads to minor regressions for sizes 1-32, and then generally auto vectorization is faster for larger sizes. And this is true for both `fill` and `copy`. >> >> Strange: `macosx_aarch64` with `copy_int`. The auto vectoirized performance has a sudden drop around 150 iterations. Also for `fill_... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - small modulo fix from review suggestion > - Merge branch 'master' into JDK-8367158-fill-and-copy-benchmarks > - more MS types > - fix MS fill > - more backing types > - object array benchmarks > - fix bm > - ms bm update > - clean up benchmark > - more types > - ... and 6 more: https://git.openjdk.org/jdk/compare/e6497e63...80378aea test/micro/org/openjdk/bench/vm/compiler/VectorBulkOperationsArray.java line 61: > 59: @Fork(value = 1) > 60: public class VectorBulkOperationsArray { > 61: @Param({ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", How about larger values? test/micro/org/openjdk/bench/vm/compiler/VectorBulkOperationsArray.java line 114: > 112: public static final int REGION_2_OBJECT_OFFSET = REGION_2_BYTE_OFFSET / 8; > 113: > 114: // The arrays with the two regions each Is there a reason you don't want to have 2 arrays, one as `dst` and one as `src`? test/micro/org/openjdk/bench/vm/compiler/VectorBulkOperationsArray.java line 202: > 200: > 201: @Benchmark > 202: public void fill_zero_byte_loop() { Should these benchmarks be annotated with `@OperationsPerInvocation`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27315#discussion_r2609223430 PR Review Comment: https://git.openjdk.org/jdk/pull/27315#discussion_r2609214012 PR Review Comment: https://git.openjdk.org/jdk/pull/27315#discussion_r2609207019 From jbhateja at openjdk.org Thu Dec 11 06:12:29 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 11 Dec 2025 06:12:29 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v7] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 10:23:30 GMT, Jatin Bhateja wrote: >> Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. >> - Add necessary inline expander support. >> - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. >> - Use existing Float16 vector IR and backend support. >> - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. >> >> The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). >> >> The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. >> >> image >> >> Initial RFP[1] was floated on the panama-dev mailing list. >> >> Kindly review the draft PR and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Optimizing tail handling > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Cleanups > - Fix failing jtreg test in CI > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 > - Cleanups > - Adding support for custom basic type T_FLOAT16, passing BasicType lane types to inline expander entries > - Cleaning up interface as per review suggestions > - Some cleanups > - ... and 10 more: https://git.openjdk.org/jdk/compare/b60ac710...44ac727d > jdk/incubator/vector/Float16Vector512Tests.java > This patch results in two of the JTREG tests failing on aarch64 machines- > > ``` > jdk/incubator/vector/Float16Vector512Tests.java > compiler/vectorapi/TestFloat16VectorOperations.java > ``` > > which is due to an issue in the `aarch64.ad` file. Fixed the failures and also added aarch64 specific IR rules which were missing for some of the tests in the `compiler/vectorapi/TestFloat16VectorOperations.java` test. > > @jatin-bhateja Could you please add the attached fix to this patch? Thanks! [fix.patch](https://github.com/user-attachments/files/24076067/fix.patch) Hi @Bhavana-Kilambi , Thanks for running this thorugh your setup, I am also able to reproduce it on Google AXION, your fix fills the gap in instruction selector, I suggest creating a seperate PR for this ? You can either create a smaller standalone reproducer testcase or mention about the tests part of this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3640370652 From chagedorn at openjdk.org Thu Dec 11 06:37:23 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 Dec 2025 06:37:23 GMT Subject: RFR: 8373420: C2: Add true/false_proj*() methods for IfNode as a replacement for proj_out*(true/false) In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 13:13:44 GMT, Christian Hagedorn wrote: > There are a lot of places in the code where we call `proj_out*(true/false)` on an `IfNode`. In some cases, we then cast the returned `ProjNode` back to `IfProjNode` or `IfTrueNode/IfFalseNode`. I often visit such code and now decided to clean this up. > > The patch proposes new `IfNode::true/false_proj*()` methods that return `IfTrueNode/IfFalseNode` directly. I walked through all `proj_out*()` calls and replaced those that used a direct `true/false` or `1/0` as argument. > > There are still more things to clean up in this area, for example, when we return `ProjNode` even though it should be an `IfProjNode` which requires more casting. But let's do that step by step in follow-up clean ups. > > Thanks, > Christian Thanks Damon for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28745#issuecomment-3640453270 From xgong at openjdk.org Thu Dec 11 06:51:26 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 11 Dec 2025 06:51:26 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 03:29:03 GMT, Eric Fang wrote: > This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance. > > Changes: > -------- > > 1. C2 mid-end: > - Added UMinReductionVNode and UMaxReductionVNode > > 2. AArch64 Backend: > - Added uminp/umaxp/sve_uminv/sve_umaxv instructions > - Updated match rules for all vector sizes and element types > - Both NEON and SVE implementation are supported > > 3. Test: > - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java > - Added assembly tests in aarch64-asmtest.py for new instructions > - Added a JTReg test file VectorUMinMaxReductionTest.java > > Different configurations were tested on aarch64 and x86 machines, and all tests passed. > > Test results of JMH benchmarks from the panama-vector project: > -------- > > On a Nvidia Grace machine with 128-bit SVE: > > Benchmark Unit Before Error After Error Uplift > Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 > Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 > Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 > Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 > Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 > Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 > Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 > Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 > Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 > Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 > Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 > Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 > Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 > Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 > Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 > Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 > Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 > Short128Vector.UMAXMaskedLanes ops/ms 308.90 351.78 15155.26 31.03 49.06 > Sh... Nice work. Thanks for your support! I noticed that this PR contains the same commit of https://github.com/openjdk/jdk/pull/28692. Could you please split the change from this PR? If this PR depends on https://github.com/openjdk/jdk/pull/28692, I wonder whether we can change the target merge branch to `pr/28692` instead of `master` please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3640490083 From epeter at openjdk.org Thu Dec 11 07:22:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Dec 2025 07:22:34 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v3] In-Reply-To: References: <6ip4JrJ4WiYEe6d2FA_WQ5dDjxAk2RPaPbwth4jNeJM=.43d7879d-89a4-434c-80ea-371c92581686@github.com> Message-ID: <0b81mH1_Y6r905N2HmehXBbSFdzLpJIfuXHNfijpHBs=.c870b13e-a52f-4c00-b771-91cf9205cb4a@github.com> On Wed, 10 Dec 2025 19:46:43 GMT, Vladimir Ivanov wrote: >> You can for example run it with C1 or Xint only. That would disable the test. Or someone runs it with Graal. >> I would generally remove `@requires` from any test we can, to get more coverage. > > It's a test on C2 IR. What's the point in running it w/o C2? You can always do more than just C2 IR verification. For example, we could also do result verification. That would give us coverage for C1 for example. I think it is just good practice not to have a restriction if it is not absolutely necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2609462695 From jbhateja at openjdk.org Thu Dec 11 07:32:58 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 11 Dec 2025 07:32:58 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v8] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Including test changes from Bhavana Kilambi (ARM) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28002/files - new: https://git.openjdk.org/jdk/pull/28002/files/44ac727d..7da5d147 Webrevs: - full: Webrev is not available because diff is too large - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28002&range=06-07 Stats: 16 lines in 1 file changed: 10 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From epeter at openjdk.org Thu Dec 11 07:37:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Dec 2025 07:37:38 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account [v2] In-Reply-To: References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> Message-ID: <9xqERVcTmfisdnrzirysJEkNNI-bdtiZonQ_ewUqypA=.4d920054-1d82-4366-887c-5bc992df9a0e@github.com> On Thu, 11 Dec 2025 05:37:48 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - small modulo fix from review suggestion >> - Merge branch 'master' into JDK-8367158-fill-and-copy-benchmarks >> - more MS types >> - fix MS fill >> - more backing types >> - object array benchmarks >> - fix bm >> - ms bm update >> - clean up benchmark >> - more types >> - ... and 6 more: https://git.openjdk.org/jdk/compare/f0212361...80378aea > > test/micro/org/openjdk/bench/vm/compiler/VectorBulkOperationsArray.java line 61: > >> 59: @Fork(value = 1) >> 60: public class VectorBulkOperationsArray { >> 61: @Param({ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", > > How about larger values? I am currently more interested in small iteration counts. That's where we see very interesting patterns. But you can always set the parameter from the outside when you run the JMH. These are just some reasonable defaults ;) > test/micro/org/openjdk/bench/vm/compiler/VectorBulkOperationsArray.java line 114: > >> 112: public static final int REGION_2_OBJECT_OFFSET = REGION_2_BYTE_OFFSET / 8; >> 113: >> 114: // The arrays with the two regions each > > Is there a reason you don't want to have 2 arrays, one as `dst` and one as `src`? Maybe I need to add some extra comments. The issue is that if you have 2 arrays, you don't know their relative alignment. You also don't have control over their distance modulo 4k. And I would like to avoid 4k aliasing effects that you get on x86 machines because of the store-to-load-forwarding only checking 12bits of the address. > test/micro/org/openjdk/bench/vm/compiler/VectorBulkOperationsArray.java line 202: > >> 200: >> 201: @Benchmark >> 202: public void fill_zero_byte_loop() { > > Should these benchmarks be annotated with `@OperationsPerInvocation`? Ah, I did not know about this. Neat idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27315#discussion_r2609499810 PR Review Comment: https://git.openjdk.org/jdk/pull/27315#discussion_r2609497670 PR Review Comment: https://git.openjdk.org/jdk/pull/27315#discussion_r2609494361 From epeter at openjdk.org Thu Dec 11 07:42:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Dec 2025 07:42:30 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account [v2] In-Reply-To: <9xqERVcTmfisdnrzirysJEkNNI-bdtiZonQ_ewUqypA=.4d920054-1d82-4366-887c-5bc992df9a0e@github.com> References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> <9xqERVcTmfisdnrzirysJEkNNI-bdtiZonQ_ewUqypA=.4d920054-1d82-4366-887c-5bc992df9a0e@github.com> Message-ID: On Thu, 11 Dec 2025 07:33:31 GMT, Emanuel Peter wrote: >> test/micro/org/openjdk/bench/vm/compiler/VectorBulkOperationsArray.java line 114: >> >>> 112: public static final int REGION_2_OBJECT_OFFSET = REGION_2_BYTE_OFFSET / 8; >>> 113: >>> 114: // The arrays with the two regions each >> >> Is there a reason you don't want to have 2 arrays, one as `dst` and one as `src`? > > Maybe I need to add some extra comments. The issue is that if you have 2 arrays, you don't know their relative alignment. You also don't have control over their distance modulo 4k. And I would like to avoid 4k aliasing effects that you get on x86 machines because of the store-to-load-forwarding only checking 12bits of the address. I already am writing something about 4k aliasing, but I can expand the explanation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27315#discussion_r2609512851 From epeter at openjdk.org Thu Dec 11 07:59:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Dec 2025 07:59:50 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account [v3] In-Reply-To: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> Message-ID: <5mMYhBudpEt7JDkC-EkGba0GGZR-kZ9LH-jh5m-W7OY=.f3967aa3-b847-42db-99b1-4492b0d78c7c@github.com> > **Summary** > > I created some `fill` and `copy` style benchmarks, covering both `arrays` and `MemorySegment`s. > Reasons for this benchmark: > - I want to compare auto-vectorization with intrinsics (array assembly style intrinsics, and MemorySegment java level special implementations). This allows us to see if some are slower than others, and if we can manage to improve the slower versions somehow in the future. > - There are some known issues we can demonstrate well with this benchmark: > - Super-Unrolling: unrolling the vectoirzed loop gets us extra performance, but the exact factor may not be optimal yet for auto-vectorization. > - Small iteration count loops: auto-vectorization can lead to slowdowns. > - Many benchmarks do not control for alignment. But that creates noise. I just go over all possible alignments, that should smooth out the noise. > - Most benchmarks do not control for 4k aliasing (x86 effect in store buffer). I make sure that load/stores are not a multiple of 4k bytes apart, so we can avoid the noise of that effect. > > ---------------------------------------------------------------------- > > **Analysis based on this Benchmark** > > Analysis done in this PR: > - Arrays: auto vectorization vs scalar loops performance > - Arrays: auto vectorization loops vs intrinsics > - MemorySegments: auto vectorization loops vs scalar loops vs `MemorySegment.fill/copy` > > Future work: > - Investigate deeper, inspect assembly, etc. > - Impact of `-XX:SuperWordAutomaticAlignment=0` on small iteration count loops. > - Investigate effect of `-XX:-OptimizeFill`. It seems that the loops in this benchmark are not detected automatically, and so the array intrinsics are not used. Why? > - Investigate impact of `CompactObjectHeaders`. Does enabling/disabling change any performance? > - Investigate if adjusting the super-unrolling factor could improve performance for auto-vectorization: [JDK-8368061](https://bugs.openjdk.org/browse/JDK-8368061) > - Performance comparison with Graal. > > ---------------------------------------------------------------------- > > **Array Benchmark: auto vectorization vs scalar** > > We can see that for arrays, auto vectorization leads to minor regressions for sizes 1-32, and then generally auto vectorization is faster for larger sizes. And this is true for both `fill` and `copy`. > > Strange: `macosx_aarch64` with `copy_int`. The auto vectoirized performance has a sudden drop around 150 iterations. Also for `fill_long` we have a "phase-transition" around 64, that goes steeper rather... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: for merykitty ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27315/files - new: https://git.openjdk.org/jdk/pull/27315/files/80378aea..716aab07 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27315&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27315&range=01-02 Stats: 94 lines in 2 files changed: 93 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27315/head:pull/27315 PR: https://git.openjdk.org/jdk/pull/27315 From epeter at openjdk.org Thu Dec 11 07:59:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Dec 2025 07:59:56 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account [v2] In-Reply-To: References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> Message-ID: On Wed, 3 Dec 2025 13:03:02 GMT, Emanuel Peter wrote: >> **Summary** >> >> I created some `fill` and `copy` style benchmarks, covering both `arrays` and `MemorySegment`s. >> Reasons for this benchmark: >> - I want to compare auto-vectorization with intrinsics (array assembly style intrinsics, and MemorySegment java level special implementations). This allows us to see if some are slower than others, and if we can manage to improve the slower versions somehow in the future. >> - There are some known issues we can demonstrate well with this benchmark: >> - Super-Unrolling: unrolling the vectoirzed loop gets us extra performance, but the exact factor may not be optimal yet for auto-vectorization. >> - Small iteration count loops: auto-vectorization can lead to slowdowns. >> - Many benchmarks do not control for alignment. But that creates noise. I just go over all possible alignments, that should smooth out the noise. >> - Most benchmarks do not control for 4k aliasing (x86 effect in store buffer). I make sure that load/stores are not a multiple of 4k bytes apart, so we can avoid the noise of that effect. >> >> ---------------------------------------------------------------------- >> >> **Analysis based on this Benchmark** >> >> Analysis done in this PR: >> - Arrays: auto vectorization vs scalar loops performance >> - Arrays: auto vectorization loops vs intrinsics >> - MemorySegments: auto vectorization loops vs scalar loops vs `MemorySegment.fill/copy` >> >> Future work: >> - Investigate deeper, inspect assembly, etc. >> - Impact of `-XX:SuperWordAutomaticAlignment=0` on small iteration count loops. >> - Investigate effect of `-XX:-OptimizeFill`. It seems that the loops in this benchmark are not detected automatically, and so the array intrinsics are not used. Why? >> - Investigate impact of `CompactObjectHeaders`. Does enabling/disabling change any performance? >> - Investigate if adjusting the super-unrolling factor could improve performance for auto-vectorization: [JDK-8368061](https://bugs.openjdk.org/browse/JDK-8368061) >> - Performance comparison with Graal. >> >> ---------------------------------------------------------------------- >> >> **Array Benchmark: auto vectorization vs scalar** >> >> We can see that for arrays, auto vectorization leads to minor regressions for sizes 1-32, and then generally auto vectorization is faster for larger sizes. And this is true for both `fill` and `copy`. >> >> Strange: `macosx_aarch64` with `copy_int`. The auto vectoirized performance has a sudden drop around 150 iterations. Also for `fill_... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - small modulo fix from review suggestion > - Merge branch 'master' into JDK-8367158-fill-and-copy-benchmarks > - more MS types > - fix MS fill > - more backing types > - object array benchmarks > - fix bm > - ms bm update > - clean up benchmark > - more types > - ... and 6 more: https://git.openjdk.org/jdk/compare/3cba5c1e...80378aea @merykitty Thanks for having a look! And thanks for the questions/suggestions! I have made some adjustments. Does it look better now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27315#issuecomment-3640690829 From qamai at openjdk.org Thu Dec 11 08:19:35 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 11 Dec 2025 08:19:35 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account [v3] In-Reply-To: <5mMYhBudpEt7JDkC-EkGba0GGZR-kZ9LH-jh5m-W7OY=.f3967aa3-b847-42db-99b1-4492b0d78c7c@github.com> References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> <5mMYhBudpEt7JDkC-EkGba0GGZR-kZ9LH-jh5m-W7OY=.f3967aa3-b847-42db-99b1-4492b0d78c7c@github.com> Message-ID: <2oHQUsUXi7uHsxqgjogPpdB6LSHrvWrbZLyI6kfLdHE=.916375a0-ba94-42a7-bc1d-c18c79f82157@github.com> On Thu, 11 Dec 2025 07:59:50 GMT, Emanuel Peter wrote: >> **Summary** >> >> I created some `fill` and `copy` style benchmarks, covering both `arrays` and `MemorySegment`s. >> Reasons for this benchmark: >> - I want to compare auto-vectorization with intrinsics (array assembly style intrinsics, and MemorySegment java level special implementations). This allows us to see if some are slower than others, and if we can manage to improve the slower versions somehow in the future. >> - There are some known issues we can demonstrate well with this benchmark: >> - Super-Unrolling: unrolling the vectoirzed loop gets us extra performance, but the exact factor may not be optimal yet for auto-vectorization. >> - Small iteration count loops: auto-vectorization can lead to slowdowns. >> - Many benchmarks do not control for alignment. But that creates noise. I just go over all possible alignments, that should smooth out the noise. >> - Most benchmarks do not control for 4k aliasing (x86 effect in store buffer). I make sure that load/stores are not a multiple of 4k bytes apart, so we can avoid the noise of that effect. >> >> ---------------------------------------------------------------------- >> >> **Analysis based on this Benchmark** >> >> Analysis done in this PR: >> - Arrays: auto vectorization vs scalar loops performance >> - Arrays: auto vectorization loops vs intrinsics >> - MemorySegments: auto vectorization loops vs scalar loops vs `MemorySegment.fill/copy` >> >> Future work: >> - Investigate deeper, inspect assembly, etc. >> - Impact of `-XX:SuperWordAutomaticAlignment=0` on small iteration count loops. >> - Investigate effect of `-XX:-OptimizeFill`. It seems that the loops in this benchmark are not detected automatically, and so the array intrinsics are not used. Why? >> - Investigate impact of `CompactObjectHeaders`. Does enabling/disabling change any performance? >> - Investigate if adjusting the super-unrolling factor could improve performance for auto-vectorization: [JDK-8368061](https://bugs.openjdk.org/browse/JDK-8368061) >> - Performance comparison with Graal. >> >> ---------------------------------------------------------------------- >> >> **Array Benchmark: auto vectorization vs scalar** >> >> We can see that for arrays, auto vectorization leads to minor regressions for sizes 1-32, and then generally auto vectorization is faster for larger sizes. And this is true for both `fill` and `copy`. >> >> Strange: `macosx_aarch64` with `copy_int`. The auto vectoirized performance has a sudden drop around 150 iterations. Also for `fill_... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > for merykitty Thanks, LGTM ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/27315#pullrequestreview-3566247883 From epeter at openjdk.org Thu Dec 11 08:39:26 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Dec 2025 08:39:26 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account [v3] In-Reply-To: <2oHQUsUXi7uHsxqgjogPpdB6LSHrvWrbZLyI6kfLdHE=.916375a0-ba94-42a7-bc1d-c18c79f82157@github.com> References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> <5mMYhBudpEt7JDkC-EkGba0GGZR-kZ9LH-jh5m-W7OY=.f3967aa3-b847-42db-99b1-4492b0d78c7c@github.com> <2oHQUsUXi7uHsxqgjogPpdB6LSHrvWrbZLyI6kfLdHE=.916375a0-ba94-42a7-bc1d-c18c79f82157@github.com> Message-ID: On Thu, 11 Dec 2025 08:16:33 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> for merykitty > > Thanks, LGTM @merykitty Thanks for for the review/approval :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27315#issuecomment-3640826608 From snatarajan at openjdk.org Thu Dec 11 08:43:26 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Thu, 11 Dec 2025 08:43:26 GMT Subject: RFR: 8370489: Some compiler tests miss the @key randomness [v2] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 11:24:19 GMT, Saranya Natarajan wrote: >> **Issue:** Some compiler tests uses randomization but does not have `@key randomness` in the jtreg header. >> >> **Fix:** The list of test cases that did not have `@key randomness` were listed using `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"`. This PR adds `@key randomness` to these tests. >> >> **Note:** The following tests that are still listed with `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"` after this PR are confirmed to be helper or support file for actual test. >> _test/hotspot/jtreg/compiler/codegen/aes/TestAESBase.java >> test/hotspot/jtreg/compiler/compilercontrol/jcmd/StressAddJcmdBase.java >> test/hotspot/jtreg/compiler/compilercontrol/parser/HugeDirectiveUtil.java >> test/hotspot/jtreg/compiler/compilercontrol/share/scenario/CommandGenerator.java >> test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java >> test/hotspot/jtreg/compiler/lib/ir_framework/test/ArgumentValue.java >> test/hotspot/jtreg/compiler/lib/ir_framework/AbstractInfo.java >> test/hotspot/jtreg/compiler/lib/ir_framework/CompLevel.java >> test/hotspot/jtreg/compiler/lib/generators/Generators.java >> test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java >> test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java >> test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java >> test/hotspot/jtreg/compiler/intrinsics/mathexact/Verify.java >> test/hotspot/jtreg/compiler/intrinsics/bmi/BMITestRunner.java >> test/hotspot/jtreg/compiler/intrinsics/unsafe/ByteBufferTest.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressBooleanArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressIntArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressLongArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressCharArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressObjectArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressByteArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressFloatArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressShortArrayCopy.java >> test/hotspot/jtreg/compiler/arraycopy/stress/StressDoubleArrayCopy.java >> test/hotspot/jtreg/compiler/codecache/cli/codeheapsize/JVMStartupRunner.java >> test/hotspot/jtreg/compiler/vectorapi/reshape/utils/VectorReshapeHelper.java >> test/hotspot/jtreg/compiler/jvmci/compilerToVM/DummyClass.java_ > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments - removing space and javadoc style Thank you for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/28463#issuecomment-3640841004 From snatarajan at openjdk.org Thu Dec 11 08:46:41 2025 From: snatarajan at openjdk.org (Saranya Natarajan) Date: Thu, 11 Dec 2025 08:46:41 GMT Subject: Integrated: 8370489: Some compiler tests miss the @key randomness In-Reply-To: References: Message-ID: On Fri, 21 Nov 2025 23:31:16 GMT, Saranya Natarajan wrote: > **Issue:** Some compiler tests uses randomization but does not have `@key randomness` in the jtreg header. > > **Fix:** The list of test cases that did not have `@key randomness` were listed using `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"`. This PR adds `@key randomness` to these tests. > > **Note:** The following tests that are still listed with `grep -l "getRandomInstance" -r test/hotspot/jtreg/compiler/ | xargs grep -L "randomness"` after this PR are confirmed to be helper or support file for actual test. > _test/hotspot/jtreg/compiler/codegen/aes/TestAESBase.java > test/hotspot/jtreg/compiler/compilercontrol/jcmd/StressAddJcmdBase.java > test/hotspot/jtreg/compiler/compilercontrol/parser/HugeDirectiveUtil.java > test/hotspot/jtreg/compiler/compilercontrol/share/scenario/CommandGenerator.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/TestVM.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/ArgumentValue.java > test/hotspot/jtreg/compiler/lib/ir_framework/AbstractInfo.java > test/hotspot/jtreg/compiler/lib/ir_framework/CompLevel.java > test/hotspot/jtreg/compiler/lib/generators/Generators.java > test/hotspot/jtreg/compiler/lib/template_framework/library/PrimitiveType.java > test/hotspot/jtreg/compiler/lib/template_framework/library/Expression.java > test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java > test/hotspot/jtreg/compiler/intrinsics/mathexact/Verify.java > test/hotspot/jtreg/compiler/intrinsics/bmi/BMITestRunner.java > test/hotspot/jtreg/compiler/intrinsics/unsafe/ByteBufferTest.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressBooleanArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressIntArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressLongArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressCharArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressObjectArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressByteArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressFloatArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressShortArrayCopy.java > test/hotspot/jtreg/compiler/arraycopy/stress/StressDoubleArrayCopy.java > test/hotspot/jtreg/compiler/codecache/cli/codeheapsize/JVMStartupRunner.java > test/hotspot/jtreg/compiler/vectorapi/reshape/utils/VectorReshapeHelper.java > test/hotspot/jtreg/compiler/jvmci/compilerToVM/DummyClass.java_ This pull request has now been integrated. Changeset: 4b774cb4 Author: Saranya Natarajan URL: https://git.openjdk.org/jdk/commit/4b774cb46d9355015a6bfcf53b47233d6f235239 Stats: 105 lines in 40 files changed: 88 ins; 6 del; 11 mod 8370489: Some compiler tests miss the @key randomness Reviewed-by: dfenacci, epeter, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28463 From qamai at openjdk.org Thu Dec 11 09:17:53 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 11 Dec 2025 09:17:53 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped Message-ID: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Hi, The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. Please take a look and leave your thoughts, thanks a lot. ------------- Commit messages: - Aggressively fold loads from objects that have not escaped Changes: https://git.openjdk.org/jdk/pull/28764/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28764&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373495 Stats: 680 lines in 6 files changed: 680 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28764.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28764/head:pull/28764 PR: https://git.openjdk.org/jdk/pull/28764 From roland at openjdk.org Thu Dec 11 09:20:22 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Dec 2025 09:20:22 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v2] In-Reply-To: References: Message-ID: > The crash occurs because verification code expects the inner and outer > loop of a loop strip mining nest to have the same number of phis but, > in this case, the inner loop has one more memory phis than the outer > loop. > > 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and > outer loops have the same number of phis, as expected. > > > 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] > 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > > 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 > through the outer loop phi: > > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTe... Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java Co-authored-by: Roberto Casta?eda Lozano - Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28677/files - new: https://git.openjdk.org/jdk/pull/28677/files/5fdd4914..2a305164 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28677&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28677&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28677/head:pull/28677 PR: https://git.openjdk.org/jdk/pull/28677 From mli at openjdk.org Thu Dec 11 09:44:39 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 11 Dec 2025 09:44:39 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v2] In-Reply-To: References: Message-ID: On Tue, 9 Dec 2025 14:12:57 GMT, Emanuel Peter wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> riscv + aarch64 > > @Hamlin-Li Thanks for publishing this so quickly, and for considering other platforms, much appreciated ? > > The patch looks good to me. > I'll run some internal testing now. @eme64 Is this pr good to go? BTW, for the pure test change, is only one approval sufficient or not? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28702#issuecomment-3641087539 From roland at openjdk.org Thu Dec 11 09:48:22 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Dec 2025 09:48:22 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v3] In-Reply-To: References: Message-ID: > The crash occurs because verification code expects the inner and outer > loop of a loop strip mining nest to have the same number of phis but, > in this case, the inner loop has one more memory phis than the outer > loop. > > 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and > outer loops have the same number of phis, as expected. > > > 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] > 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > > 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 > through the outer loop phi: > > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTe... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - review - Merge branch 'master' into JDK-8370200 - Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java Co-authored-by: Roberto Casta?eda Lozano - Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java Co-authored-by: Roberto Casta?eda Lozano - more - test - more - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28677/files - new: https://git.openjdk.org/jdk/pull/28677/files/2a305164..6d3109c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28677&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28677&range=01-02 Stats: 45016 lines in 627 files changed: 27798 ins; 14173 del; 3045 mod Patch: https://git.openjdk.org/jdk/pull/28677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28677/head:pull/28677 PR: https://git.openjdk.org/jdk/pull/28677 From roland at openjdk.org Thu Dec 11 09:48:23 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Dec 2025 09:48:23 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: References: <_oUy5ZPqiqz05sYchjgUEtf_L4I077g3XKK0o8DoF8Q=.565b5e4c-eeea-475b-8d53-69d564b92a15@github.com> Message-ID: <6QOKJXXbqelN7lo5i2zwNdeEAYKpnc5f3GLo-5IlQc0=.b0b4e0b0-b593-4e3c-b8d7-25f40dbdffee@github.com> On Wed, 10 Dec 2025 13:52:40 GMT, Roberto Casta?eda Lozano wrote: > Like Daniel and Damon, I also have a slight preference towards enqueuing the node and letting `PhiNode::Identity` perform the change. Thanks @dlunde @dafedafe @robcasloz for the comments. I made that change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28677#issuecomment-3641103637 From roland at openjdk.org Thu Dec 11 09:50:50 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Dec 2025 09:50:50 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v3] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 10:11:56 GMT, Damon Fenacci wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8370200 >> - Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java >> >> Co-authored-by: Roberto Casta?eda Lozano >> - Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java >> >> Co-authored-by: Roberto Casta?eda Lozano >> - more >> - test >> - more >> - fix > > src/hotspot/share/opto/cfgnode.cpp line 2753: > >> 2751: >> 2752: bool PhiNode::can_be_replaced_by(const PhiNode* other) const { >> 2753: return type() == Type::MEMORY && other->type() == Type::MEMORY && adr_type() != TypePtr::BOTTOM && > > I think I might miss something but I was wondering if we strictly need to check for `adr_type() != TypePtr::BOTTOM` Are you suggesting we could do: bool PhiNode::can_be_replaced_by(const PhiNode* other) const { return type() == Type::MEMORY && other->type() == Type::MEMORY && other->adr_type() == TypePtr::BOTTOM && has_same_inputs_as(other); } ? If there are 2 memory `Phi`s with same inputs and same `adr_type` then global value numbering should common them so that would make no difference. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2609907749 From rcastanedalo at openjdk.org Thu Dec 11 09:55:48 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 11 Dec 2025 09:55:48 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: <6QOKJXXbqelN7lo5i2zwNdeEAYKpnc5f3GLo-5IlQc0=.b0b4e0b0-b593-4e3c-b8d7-25f40dbdffee@github.com> References: <_oUy5ZPqiqz05sYchjgUEtf_L4I077g3XKK0o8DoF8Q=.565b5e4c-eeea-475b-8d53-69d564b92a15@github.com> <6QOKJXXbqelN7lo5i2zwNdeEAYKpnc5f3GLo-5IlQc0=.b0b4e0b0-b593-4e3c-b8d7-25f40dbdffee@github.com> Message-ID: <25w4gq5xRor4xfegyoA3wRFjLz0Gyoi59RBGlq0Y2UQ=.d8b4046f-9d75-44ce-a21d-2ecc84f81407@github.com> On Thu, 11 Dec 2025 09:45:36 GMT, Roland Westrelin wrote: > > Like Daniel and Damon, I also have a slight preference towards enqueuing the node and letting `PhiNode::Identity` perform the change. > > Thanks @dlunde @dafedafe @robcasloz for the comments. I made that change. Thanks Roland, will come back with internal test results. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28677#issuecomment-3641131062 From roland at openjdk.org Thu Dec 11 10:05:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Dec 2025 10:05:56 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v4] In-Reply-To: References: Message-ID: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: - Update test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java Co-authored-by: Emanuel Peter - Update test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java Co-authored-by: Beno?t Maillard - Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28581/files - new: https://git.openjdk.org/jdk/pull/28581/files/36fb3a6f..133fcddc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=02-03 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28581/head:pull/28581 PR: https://git.openjdk.org/jdk/pull/28581 From epeter at openjdk.org Thu Dec 11 10:09:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Dec 2025 10:09:33 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v3] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 14:59:20 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> >> [JDK-8357551](https://bugs.openjdk.org/browse/JDK-8357551) add support of CMoveF/D vectorization, at the same time it also adds some tests for scalar CMove on riscv. >> It's good to enable these tests on other platforms, like x86/aarch64 or maybe others. >> >> At the same time, this pr also move these tests under `compiler/c2/cmove`, as suggested here https://github.com/openjdk/jdk/pull/28309#discussion_r2598664764. >> >> Thanks! >> >> ## Test >> In progress... (I'm using github CI to run the tests.) > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > applyIf @Hamlin-Li Thanks for the updates! Still looks good to me. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28702#pullrequestreview-3566673162 From mli at openjdk.org Thu Dec 11 10:12:34 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 11 Dec 2025 10:12:34 GMT Subject: RFR: 8371920: [TEST] Enable CMove tests on other platforms [v2] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 09:41:29 GMT, Hamlin Li wrote: >> @Hamlin-Li Thanks for publishing this so quickly, and for considering other platforms, much appreciated ? >> >> The patch looks good to me. >> I'll run some internal testing now. > > @eme64 Is this pr good to go? > BTW, for the pure test change, is only one approval sufficient or not? > @Hamlin-Li Thanks for the updates! Still looks good to me. @eme64 Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28702#issuecomment-3641197114 From rcastanedalo at openjdk.org Thu Dec 11 10:17:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 11 Dec 2025 10:17:52 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped In-Reply-To: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: <7jlrWzhI6nW9zBzpvAs3XICMq2rvwmZaI53_Dbk7mxM=.fe96e427-6399-451a-a497-3918b8df4f00@github.com> On Thu, 11 Dec 2025 09:10:30 GMT, Quan Anh Mai wrote: > Hi, > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. > > For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. > > Please take a look and leave your thoughts, thanks a lot. Interesting improvement, thanks for working in this area, Quan Anh! Please allow us some time to think thoroughly about it and how it relates to other plans to improve escape analysis and scalar replacement in C2. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3641214141 From roland at openjdk.org Thu Dec 11 10:35:27 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Dec 2025 10:35:27 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: References: Message-ID: <_4CdutrS1tyxf_rqM_xdgccYtxlQ0slKbNC9tYxK89Q=.d1289c09-69da-4962-be5a-42252cf33fdb@github.com> On Wed, 10 Dec 2025 13:13:48 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - review >> - review > > src/hotspot/share/opto/loopnode.hpp line 1217: > >> 1215: PhaseTransform(Ideal_Loop), >> 1216: _arena(mtCompiler, Arena::Tag::tag_idealloop), >> 1217: _loop_or_ctrl(&_arena), > > How about some of the other data structures? For example `_idom`? They are allocated in the thread's resource area. So there's no leak and while for `_loop_or_ctrl` and `_body` there were issues that were solved by moving them to the compile arena, there hasn't been any so far with other data structures such as `_idom`. So, sure, we could pro actively move them to the new arena but do we gain anything from doing that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2610055708 From dlunden at openjdk.org Thu Dec 11 10:58:28 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 11 Dec 2025 10:58:28 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v3] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 09:48:22 GMT, Roland Westrelin wrote: >> The crash occurs because verification code expects the inner and outer >> loop of a loop strip mining nest to have the same number of phis but, >> in this case, the inner loop has one more memory phis than the outer >> loop. >> >> 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and >> outer loops have the same number of phis, as expected. >> >> >> 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] >> 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> >> 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 >> through the outer loop phi: >> >> >> 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] >> 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8370200 > - Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java > > Co-authored-by: Roberto Casta?eda Lozano > - Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java > > Co-authored-by: Roberto Casta?eda Lozano > - more > - test > - more > - fix Looks good @rwestrel! A few very minor suggestions. src/hotspot/share/opto/cfgnode.cpp line 2689: > 2687: // doesn't happen. > 2688: // Look for non bottom Phis that should be transformed and enqueue them for igvn so PhiNode::Identity executes for > 2689: // them. Suggestion: // PhiNode::Identity replaces a non-bottom memory phi with a bottom memory phi with the same inputs, if it exists. // If the bottom memory phi's inputs are changed (so it can now replace the non-bottom memory phi) or if it's created // only after the non-bottom memory phi is processed by igvn, PhiNode::Identity doesn't run and the transformation // doesn't happen. // Look for non-bottom Phis that should be transformed and enqueue them for igvn so that PhiNode::Identity executes for // them. src/hotspot/share/opto/node.cpp line 2898: > 2896: } > 2897: > 2898: Suggestion: test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java line 44: > 42: int i, i1, i15 = 4, i16 = 4; > 43: for (i = 1; i < 7; ++i) { > 44: l = i; Suggestion: l = i; ------------- Marked as reviewed by dlunden (Committer). PR Review: https://git.openjdk.org/jdk/pull/28677#pullrequestreview-3566848963 PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2610119032 PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2610122033 PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2610121611 From qamai at openjdk.org Thu Dec 11 11:03:52 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 11 Dec 2025 11:03:52 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v2] In-Reply-To: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: > Hi, > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. > > For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Some runtime calls may receive a derived pointer but not the base ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28764/files - new: https://git.openjdk.org/jdk/pull/28764/files/9658bde5..f558c90b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28764&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28764&range=00-01 Stats: 59 lines in 2 files changed: 56 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28764.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28764/head:pull/28764 PR: https://git.openjdk.org/jdk/pull/28764 From erfang at openjdk.org Thu Dec 11 11:11:31 2025 From: erfang at openjdk.org (Eric Fang) Date: Thu, 11 Dec 2025 11:11:31 GMT Subject: RFR: 8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max reduction operations In-Reply-To: References: Message-ID: On Mon, 8 Dec 2025 03:29:03 GMT, Eric Fang wrote: > This patch adds intrinsic support for UMIN and UMAX reduction operations in the Vector API on AArch64, enabling direct hardware instruction mapping for better performance. > > Changes: > -------- > > 1. C2 mid-end: > - Added UMinReductionVNode and UMaxReductionVNode > > 2. AArch64 Backend: > - Added uminp/umaxp/sve_uminv/sve_umaxv instructions > - Updated match rules for all vector sizes and element types > - Both NEON and SVE implementation are supported > > 3. Test: > - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java > - Added assembly tests in aarch64-asmtest.py for new instructions > - Added a JTReg test file VectorUMinMaxReductionTest.java > > Different configurations were tested on aarch64 and x86 machines, and all tests passed. > > Test results of JMH benchmarks from the panama-vector project: > -------- > > On a Nvidia Grace machine with 128-bit SVE: > > Benchmark Unit Before Error After Error Uplift > Byte128Vector.UMAXLanes ops/ms 411.60 42.18 25226.51 33.92 61.29 > Byte128Vector.UMAXMaskedLanes ops/ms 558.56 85.12 25182.90 28.74 45.09 > Byte128Vector.UMINLanes ops/ms 645.58 780.76 28396.29 103.11 43.99 > Byte128Vector.UMINMaskedLanes ops/ms 621.09 718.27 26122.62 42.68 42.06 > Byte64Vector.UMAXLanes ops/ms 296.33 34.44 14357.74 15.95 48.45 > Byte64Vector.UMAXMaskedLanes ops/ms 376.54 44.01 14269.24 21.41 37.90 > Byte64Vector.UMINLanes ops/ms 373.45 426.51 15425.36 66.20 41.31 > Byte64Vector.UMINMaskedLanes ops/ms 353.32 346.87 14201.37 13.79 40.19 > Int128Vector.UMAXLanes ops/ms 174.79 192.51 9906.07 286.93 56.67 > Int128Vector.UMAXMaskedLanes ops/ms 157.23 206.68 10246.77 11.44 65.17 > Int64Vector.UMAXLanes ops/ms 95.30 126.49 4719.30 98.57 49.52 > Int64Vector.UMAXMaskedLanes ops/ms 88.19 87.44 4693.18 19.76 53.22 > Long128Vector.UMAXLanes ops/ms 80.62 97.82 5064.01 35.52 62.82 > Long128Vector.UMAXMaskedLanes ops/ms 78.15 102.91 5028.24 8.74 64.34 > Long64Vector.UMAXLanes ops/ms 47.56 62.01 46.76 52.28 0.98 > Long64Vector.UMAXMaskedLanes ops/ms 45.44 46.76 45.79 42.91 1.01 > Short128Vector.UMAXLanes ops/ms 316.65 410.30 14814.82 23.65 46.79 > Short128Vector.UMAXMaskedLanes ops/ms 308.90 351.78 15155.26 31.03 49.06 > Sh... > Nice work. Thanks for your support! > > I noticed that this PR contains the same commit of #28692. Could you please split the change from this PR? If this PR depends on #28692, I wonder whether we can change the target merge branch to `pr/28692` instead of `master` please? Yeah, I think I made a mistake when pushing the PR. I'll just convert this PR as draft since #28692 is under active review. Then rebase the PR after #28692 is merged. Thanks for the reminder~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/28693#issuecomment-3641410633 From erfang at openjdk.org Thu Dec 11 11:14:25 2025 From: erfang at openjdk.org (Eric Fang) Date: Thu, 11 Dec 2025 11:14:25 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v2] In-Reply-To: References: <4vSKAtr0tUG0V193gIvnEFdHm18ZhqflVAwk-09IVQ0=.081806f5-6303-4b4f-975d-7c85427ccae5@github.com> Message-ID: On Fri, 28 Nov 2025 09:21:18 GMT, Galder Zamarre?o wrote: >> Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Don't read and write the same memory in the JMH benchmarks >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns >> >> `VectorMaskCastNode` is used to cast a vector mask from one type to >> another type. The cast may be generated by calling the vector API `cast` >> or generated by the compiler. For example, some vector mask operations >> like `trueCount` require the input mask to be integer types, so for >> floating point type masks, the compiler will cast the mask to the >> corresponding integer type mask automatically before doing the mask >> operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` >> don't generate code, otherwise code will be generated to extend or narrow >> the mask. This IR node is not free no matter it generates code or not >> because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` >> The middle `VectorMaskCast` prevented the following optimization: >> `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which >> blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we >> can safely do the optimization. But if the input value is changed, we >> can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper >> function, which can be used to uncast a chain of `VectorMaskCastNode`, >> like the existing `Node::uncast(bool)` function. The funtion returns >> the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may >> contain one or more consecutive `VectorMaskCastNode` and this does not >> affect the correctness of the optimization. Then this function can be >> called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMa... > > Nice improvement @erifan, just some small comments from me Hi @galderz would you mind taking another look of this PR, thanks~ ------------- PR Comment: https://git.openjdk.org/jdk/pull/28313#issuecomment-3641419459 From qamai at openjdk.org Thu Dec 11 11:18:24 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 11 Dec 2025 11:18:24 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped In-Reply-To: <7jlrWzhI6nW9zBzpvAs3XICMq2rvwmZaI53_Dbk7mxM=.fe96e427-6399-451a-a497-3918b8df4f00@github.com> References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> <7jlrWzhI6nW9zBzpvAs3XICMq2rvwmZaI53_Dbk7mxM=.fe96e427-6399-451a-a497-3918b8df4f00@github.com> Message-ID: On Thu, 11 Dec 2025 10:14:41 GMT, Roberto Casta?eda Lozano wrote: >> Hi, >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. >> >> For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. >> >> Please take a look and leave your thoughts, thanks a lot. > > Interesting improvement, thanks for working in this area, Quan Anh! Please allow us some time to think thoroughly about it and how it relates to other plans to improve escape analysis and scalar replacement in C2. @robcasloz Thanks for taking a look. I also wonder how this relates to other potential improvements to EA. I think that this can work as an independent step or as a first step toward those goals. I am also pretty excited to realize that we don't need to schedule the graph to know if a load can be folded in such a manner, hope this can also be useful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3641433608 From qamai at openjdk.org Thu Dec 11 12:02:13 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 11 Dec 2025 12:02:13 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v3] In-Reply-To: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: > Hi, > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. > > For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Just use candidate_set directly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28764/files - new: https://git.openjdk.org/jdk/pull/28764/files/f558c90b..1e260a82 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28764&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28764&range=01-02 Stats: 19 lines in 1 file changed: 1 ins; 1 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/28764.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28764/head:pull/28764 PR: https://git.openjdk.org/jdk/pull/28764 From bkilambi at openjdk.org Thu Dec 11 12:09:30 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 11 Dec 2025 12:09:30 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <98jWF_NhAAB1WHHsotReB6SYIVSRIWNO0rmhxnNMJM8=.f21f3406-f3b3-4ce5-b009-6e50e2ebe1f1@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <98jWF_NhAAB1WHHsotReB6SYIVSRIWNO0rmhxnNMJM8=.f21f3406-f3b3-4ce5-b009-6e50e2ebe1f1@github.com> Message-ID: On Thu, 2 Oct 2025 13:21:32 GMT, Marc Chevalier wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > I see now the flags are not triviall: > > -XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers -XX:VerifyConstraintCasts=1 -XX:+StressLoopPeeling > > a lot of stress file. It's likely that many runs might be needed to reproduce. > > The machine is a VM.Standard.A1.Flex shape, as described in > https://docs.oracle.com/en-us/iaas/Content/Compute/References/computeshapes.htm > > Backtrace at the failure: > > Current CompileTask: > C2:1523 346 % b compiler.vectorization.TestFloat16VectorOperations::vectorAddReductionFloat16 @ 4 (39 bytes) > > Stack: [0x0000ffff84799000,0x0000ffff84997000], sp=0x0000ffff849920d0, free space=2020k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x7da724] C2_MacroAssembler::neon_reduce_add_fp16(FloatRegister, FloatRegister, FloatRegister, unsigned int, FloatRegister)+0x2b4 (c2_MacroAssembler_aarch64.cpp:1930) > V [libjvm.so+0x154492c] PhaseOutput::scratch_emit_size(Node const*)+0x2ec (output.cpp:3171) > V [libjvm.so+0x153d4a4] PhaseOutput::shorten_branches(unsigned int*)+0x2e4 (output.cpp:528) > V [libjvm.so+0x154dcdc] PhaseOutput::Output()+0x95c (output.cpp:328) > V [libjvm.so+0x9be070] Compile::Code_Gen()+0x7f0 (compile.cpp:3127) > V [libjvm.so+0x9c21c0] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1774 (compile.cpp:894) > V [libjvm.so+0x7eec64] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x2e0 (c2compiler.cpp:147) > V [libjvm.so+0x9d0f8c] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xb08 (compileBroker.cpp:2345) > V [libjvm.so+0x9d1eb8] CompileBroker::compiler_thread_loop()+0x638 (compileBroker.cpp:1989) > V [libjvm.so+0xed25a8] JavaThread::thread_main_inner()+0x108 (javaThread.cpp:775) > V [libjvm.so+0x18466dc] Thread::call_run()+0xac (thread.cpp:243) > V [libjvm.so+0x152349c] thread_native_entry(Thread*)+0x12c (os_linux.cpp:895) > C [libc.so.6+0x80b50] start_thread+0x300 > > > I've attached the replay file in the JBS issue, if it can help. Hi @marc-chevalier Apologies for the delay in responding to your review comments. I have been looking at the JTREG test failures you have reported for my patch. It looks like it's not something that's caused by my patch itself. I can reproduce this error on the master branch for the other tests in `compiler/vectorization/TestFloat16VectorOperations.java` as well and it's reproducible on both AArch64 and x86_64 machines. With a quick look, it looks like for some of the failing tests, autovectorization does happen but IR rule still fails because it is expecting vector nodes of a specific shape. Adding `IRNode.VECTOR_SIZE_ANY` helped resolve those failures but still some other tests fail due to autovectorization not happening in them. I feel this needs to be looked at separately as these failures exist on master branch as well and not really caused by this patch. Would you suggest I create a separate ticket for this task? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3641187476 From mchevalier at openjdk.org Thu Dec 11 12:09:33 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 11 Dec 2025 12:09:33 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: <_QEYCQm138PWv2vGjMFvEJ6kfMjGEn_vsuEZ_EPaRxQ=.b42967e5-cc22-4c98-a454-6698ce0a70cf@github.com> On Fri, 26 Sep 2025 12:00:31 GMT, Bhavana Kilambi wrote: > This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - > > **For AddReduction :** > On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. > > **For MulReduction :** > Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. > > Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - > > Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the master branch. > > **N1 (UseSVE = 0, max vector length = 16B):** > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionAddFP16 256 thrpt 9 1.41 1.40 > ReductionAddFP16 512 thrpt 9 1.41 1.41 > ReductionAddFP16 1024 thrpt 9 1.43 1.40 > ReductionAddFP16 2048 thrpt 9 1.43 1.40 > ReductionMulFP16 256 thrpt 9 1.22 1.22 > ReductionMulFP16 512 thrpt 9 1.21 1.23 > ReductionMulFP16 1024 thrpt 9 1.21 1.22 > ReductionMulFP16 2048 thrpt 9 1.20 1.22 > > > On N1, the scalarized sequence of `fadd/fmul` are generated for both `MaxVectorSize` of 8B and 16B for add reduction ... I'm a bit confused. The failure I observed is an assert failing during code generation. This is a compiler crash, not an IR rule failure, and I don't think it can be solved by changing the content of `IRNode.java`. Am I misunderstanding something? Of course, we can have multiple problems, but it seems not to be what I reported. It was a while ago, but I think I checked that this failure didn't appear in our history back then. So, either it was new, or it is intermittent and I was just unlucky (always possible). As for the IR verification failure, I've looked a bit and couldn't find such an issue already. Since it reproduces on master, I suggest you file a ticket, indeed. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3641228424 PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3641614453 From epeter at openjdk.org Thu Dec 11 12:15:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Dec 2025 12:15:23 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: <_4CdutrS1tyxf_rqM_xdgccYtxlQ0slKbNC9tYxK89Q=.d1289c09-69da-4962-be5a-42252cf33fdb@github.com> References: <_4CdutrS1tyxf_rqM_xdgccYtxlQ0slKbNC9tYxK89Q=.d1289c09-69da-4962-be5a-42252cf33fdb@github.com> Message-ID: On Thu, 11 Dec 2025 10:32:34 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.hpp line 1217: >> >>> 1215: PhaseTransform(Ideal_Loop), >>> 1216: _arena(mtCompiler, Arena::Tag::tag_idealloop), >>> 1217: _loop_or_ctrl(&_arena), >> >> How about some of the other data structures? For example `_idom`? > > They are allocated in the thread's resource area. So there's no leak and while for `_loop_or_ctrl` and `_body` there were issues that were solved by moving them to the compile arena, there hasn't been any so far with other data structures such as `_idom`. So, sure, we could pro actively move them to the new arena but do we gain anything from doing that? Yes, I think we would! Keeping `_idom` on the thread resource area means we often cannot use `ResourceMark` for other data structures, if possibly `_idom` is modified in the same scope. @iwanowww could not place `ResourceMark`s in some of this current PR: https://github.com/openjdk/jdk/pull/25315 For example: +bool PhaseIdealLoop::optimize_reachability_fences() { + Compile::TracePhase tp(_t_reachability_optimize); + + assert(OptimizeReachabilityFences, "required"); + + // ResourceMark rm; // NB! not safe because insert_rf may trigger _idom reallocation + Unique_Node_List redundant_rfs; + GrowableArray> worklist; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2610352515 From epeter at openjdk.org Thu Dec 11 12:17:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Dec 2025 12:17:47 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v27] In-Reply-To: References: Message-ID: On Sat, 15 Nov 2025 02:28:55 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > IR test cases Roland is working on a way to put some loop-opts data structure on a separate arena. That would allow you to put in more `ResourceMark`s, where you currently are not able to. See: https://github.com/openjdk/jdk/pull/28581#discussion_r2606618710 I think this is worth fixing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-3641646855 From wenanjian at openjdk.org Thu Dec 11 12:22:13 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Thu, 11 Dec 2025 12:22:13 GMT Subject: RFR: 8373069: RISC-V: implement GHASH intrinsic [v4] In-Reply-To: References: Message-ID: <8MCvHCHkscmoOkB_cKGP5mkhHWBw6B3PfalaBL4aVg0=.0a6e3bc9-7b6b-498d-81fb-1a276adc2a31@github.com> > support GHASH intrinsic for crypt GCM, which need zvkg extension. > > passed the tests in > test/hotspot/jtreg/compiler/codegen/aes/ > test/jdk/com/sun/crypto Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: modify format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28548/files - new: https://git.openjdk.org/jdk/pull/28548/files/fc63cb77..3bf38390 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28548&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28548&range=02-03 Stats: 4 lines in 2 files changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28548.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28548/head:pull/28548 PR: https://git.openjdk.org/jdk/pull/28548 From bkilambi at openjdk.org Thu Dec 11 12:24:00 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 11 Dec 2025 12:24:00 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Fri, 26 Sep 2025 12:00:31 GMT, Bhavana Kilambi wrote: > This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - > > **For AddReduction :** > On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. > > **For MulReduction :** > Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. > > Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - > > Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the master branch. > > **N1 (UseSVE = 0, max vector length = 16B):** > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionAddFP16 256 thrpt 9 1.41 1.40 > ReductionAddFP16 512 thrpt 9 1.41 1.41 > ReductionAddFP16 1024 thrpt 9 1.43 1.40 > ReductionAddFP16 2048 thrpt 9 1.43 1.40 > ReductionMulFP16 256 thrpt 9 1.22 1.22 > ReductionMulFP16 512 thrpt 9 1.21 1.23 > ReductionMulFP16 1024 thrpt 9 1.21 1.22 > ReductionMulFP16 2048 thrpt 9 1.20 1.22 > > > On N1, the scalarized sequence of `fadd/fmul` are generated for both `MaxVectorSize` of 8B and 16B for add reduction ... Apologies, I missed to address the assertion failure you pointed out in my previous comment. It seems to exist because gdb showed that the combined stress flags somehow set the vector length to 4B which is not allowed. The assertion failure itself can be fixed by adding `length < 8` to this condition in aarch64_vector.ad file - ` if (length < 8 || length_in_bytes > 16 || !is_feat_fp16_supported()) { return false; } ` which would avoid vectorization for 4B vector length. But after this change, the IR rules for reduction fail because now the vector reduction nodes are not generated but the IR rule is expecting them. I'll look into this but I actually noticed that this test fail even on master branch with the following IR failures - One or more @IR rules failed: Failed IR Rules (11) of Methods (11) ------------------------------------ 1) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorAddConstInputFloat16()" - [Failed IR rules: 1]: * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#ADD_VHF#_", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"fphp", "true", "asimdhp", "true"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(AddVHF.*)+(\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 2) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorAddFloat16()" - [Failed IR rules: 1]: * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#ADD_VHF#_", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"fphp", "true", "asimdhp", "true"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(AddVHF.*)+(\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 3) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorDivFloat16()" - [Failed IR rules: 1]: * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#DIV_VHF#_", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"fphp", "true", "asimdhp", "true"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(DivVHF.*)+(\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 4) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorFmaFloat16()" - [Failed IR rules: 1]: * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#FMA_VHF#_", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"fphp", "true", "asimdhp", "true"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(FmaVHF.*)+(\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 5) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorFmaFloat16MixedConstants()" - [Failed IR rules: 1]: * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#FMA_VHF#_", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"fphp", "true", "asimdhp", "true"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(FmaVHF.*)+(\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 6) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorFmaFloat16ScalarMixedConstants()" - [Failed IR rules: 1]: * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#FMA_VHF#_", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"fphp", "true", "asimdhp", "true"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(FmaVHF.*)+(\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 7) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorMaxFloat16()" - [Failed IR rules: 1]: * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#MAX_VHF#_", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"fphp", "true", "asimdhp", "true"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(MaxVHF.*)+(\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 8) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorMinFloat16()" - [Failed IR rules: 1]: * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#MIN_VHF#_", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"fphp", "true", "asimdhp", "true"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(MinVHF.*)+(\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 9) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorMulFloat16()" - [Failed IR rules: 1]: * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#MUL_VHF#_", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"fphp", "true", "asimdhp", "true"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(MulVHF.*)+(\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 10) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorSqrtFloat16()" - [Failed IR rules: 1]: * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#SQRT_VHF#_", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"fphp", "true", "asimdhp", "true"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(SqrtVHF.*)+(\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 11) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorSubFloat16()" - [Failed IR rules: 1]: * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#SUB_VHF#_", " >0 "}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={"fphp", "true", "asimdhp", "true"}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(SubVHF.*)+(\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! Mostly looks like the expected shape (the default is the `VECTOR_SIZE_MAX`) is not found in the IR graph (as the stress flags might have resulted in a change in vector length) and these failures seem to exist on both aarch64 and x86_64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3641660105 From epeter at openjdk.org Thu Dec 11 12:34:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Dec 2025 12:34:33 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Fri, 26 Sep 2025 12:00:31 GMT, Bhavana Kilambi wrote: > This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - > > **For AddReduction :** > On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. > > **For MulReduction :** > Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. > > Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - > > Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the master branch. > > **N1 (UseSVE = 0, max vector length = 16B):** > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionAddFP16 256 thrpt 9 1.41 1.40 > ReductionAddFP16 512 thrpt 9 1.41 1.41 > ReductionAddFP16 1024 thrpt 9 1.43 1.40 > ReductionAddFP16 2048 thrpt 9 1.43 1.40 > ReductionMulFP16 256 thrpt 9 1.22 1.22 > ReductionMulFP16 512 thrpt 9 1.21 1.23 > ReductionMulFP16 1024 thrpt 9 1.21 1.22 > ReductionMulFP16 2048 thrpt 9 1.20 1.22 > > > On N1, the scalarized sequence of `fadd/fmul` are generated for both `MaxVectorSize` of 8B and 16B for add reduction ... And it may be worth to fix this test before adding more changes to Float16, just to be sure we have a correct base to build on and are not making thins worse ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3641706953 From roland at openjdk.org Thu Dec 11 13:02:43 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Dec 2025 13:02:43 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v5] In-Reply-To: References: Message-ID: <1wI5cirlHfpzcuxIYfBYp1AZ-qXC8chszZKC9vHtkK4=.d7f4a152-4bed-4df5-b4ab-4062474cc220@github.com> > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - review - Merge branch 'master' into JDK-8370519 - Update test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java Co-authored-by: Emanuel Peter - Update test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java Co-authored-by: Beno?t Maillard - Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Emanuel Peter - review - review - Update src/hotspot/share/opto/compile.hpp Co-authored-by: Manuel H?ssig - whitespaces - more - ... and 4 more: https://git.openjdk.org/jdk/compare/7f29f798...8411ff3d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28581/files - new: https://git.openjdk.org/jdk/pull/28581/files/133fcddc..8411ff3d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=03-04 Stats: 45013 lines in 627 files changed: 27798 ins; 14172 del; 3043 mod Patch: https://git.openjdk.org/jdk/pull/28581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28581/head:pull/28581 PR: https://git.openjdk.org/jdk/pull/28581 From roland at openjdk.org Thu Dec 11 13:04:42 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Dec 2025 13:04:42 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: References: <_4CdutrS1tyxf_rqM_xdgccYtxlQ0slKbNC9tYxK89Q=.d1289c09-69da-4962-be5a-42252cf33fdb@github.com> Message-ID: <0rToTq8A20oksbCQasA5Uuy4YNgDRIFt2GgwYbfOjKE=.7d94d1f5-8370-4a6b-98ca-68230b22c874@github.com> On Thu, 11 Dec 2025 12:12:02 GMT, Emanuel Peter wrote: >> They are allocated in the thread's resource area. So there's no leak and while for `_loop_or_ctrl` and `_body` there were issues that were solved by moving them to the compile arena, there hasn't been any so far with other data structures such as `_idom`. So, sure, we could pro actively move them to the new arena but do we gain anything from doing that? > > Yes, I think we would! Keeping `_idom` on the thread resource area means we often cannot use `ResourceMark` for other data structures, if possibly `_idom` is modified in the same scope. > > @iwanowww could not place `ResourceMark`s in some of this current PR: > https://github.com/openjdk/jdk/pull/25315 > > For example: > > +bool PhaseIdealLoop::optimize_reachability_fences() { > + Compile::TracePhase tp(_t_reachability_optimize); > + > + assert(OptimizeReachabilityFences, "required"); > + > + // ResourceMark rm; // NB! not safe because insert_rf may trigger _idom reallocation > + Unique_Node_List redundant_rfs; > + GrowableArray> worklist; Ok. Done in new commit for `_idom` and `_dom_depth` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2610504368 From bkilambi at openjdk.org Thu Dec 11 13:11:27 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 11 Dec 2025 13:11:27 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Thu, 11 Dec 2025 12:31:54 GMT, Emanuel Peter wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > And it may be worth to fix this test before adding more changes to Float16, just to be sure we have a correct base to build on and are not making thins worse ;) @eme64 I agree. Would you suggest I open a separate ticket to fix this test with the stress flags? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3641848670 From rsunderbabu at openjdk.org Thu Dec 11 13:21:08 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Thu, 11 Dec 2025 13:21:08 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v4] In-Reply-To: References: Message-ID: > Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. > MD5 > SHA1 > SHA256 > SHA3 > > Testing: > All flag combinations from CI > hotspot tiers 1 to 5 > PS: only for tier testings, mac-aarch was skipped due to resource constraints Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: reverting SHA3 changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28634/files - new: https://git.openjdk.org/jdk/pull/28634/files/8982a058..af547aaa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28634&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28634&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28634/head:pull/28634 PR: https://git.openjdk.org/jdk/pull/28634 From rsunderbabu at openjdk.org Thu Dec 11 13:21:08 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Thu, 11 Dec 2025 13:21:08 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v3] In-Reply-To: References: Message-ID: <9ADykpxVRUnyvfsx-PHVd9Jfnssl7b-EX433Nt8RPd8=.da6fa1dd-5b67-46af-b106-10caa8976b98@github.com> On Fri, 5 Dec 2025 17:35:15 GMT, Ramkumar Sunderbabu wrote: >> Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. >> MD5 >> SHA1 >> SHA256 >> SHA3 >> >> Testing: >> All flag combinations from CI >> hotspot tiers 1 to 5 >> PS: only for tier testings, mac-aarch was skipped due to resource constraints > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > Fix TestUseSHA3IntrinsicsOptionOnSupportedCPU There seems to be a bug in WhiteBox call to get intrinsics support for SHA3 in x64 hosts. I need more time for investigating the issue. I have 2 options, 1. Wait for clarity on SHA3 2. Complete this bug at the current form and work on SHA3 in a separate bug. @shqking could you please suggest? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3641878412 From qamai at openjdk.org Thu Dec 11 13:38:24 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 11 Dec 2025 13:38:24 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v4] In-Reply-To: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: > Hi, > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. > > For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: store values need normalizing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28764/files - new: https://git.openjdk.org/jdk/pull/28764/files/1e260a82..045cd4ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28764&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28764&range=02-03 Stats: 33 lines in 1 file changed: 29 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28764.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28764/head:pull/28764 PR: https://git.openjdk.org/jdk/pull/28764 From epeter at openjdk.org Thu Dec 11 14:04:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Dec 2025 14:04:16 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Fri, 26 Sep 2025 12:00:31 GMT, Bhavana Kilambi wrote: > This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - > > **For AddReduction :** > On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. > > **For MulReduction :** > Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. > > Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - > > Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the master branch. > > **N1 (UseSVE = 0, max vector length = 16B):** > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionAddFP16 256 thrpt 9 1.41 1.40 > ReductionAddFP16 512 thrpt 9 1.41 1.41 > ReductionAddFP16 1024 thrpt 9 1.43 1.40 > ReductionAddFP16 2048 thrpt 9 1.43 1.40 > ReductionMulFP16 256 thrpt 9 1.22 1.22 > ReductionMulFP16 512 thrpt 9 1.21 1.23 > ReductionMulFP16 1024 thrpt 9 1.21 1.22 > ReductionMulFP16 2048 thrpt 9 1.20 1.22 > > > On N1, the scalarized sequence of `fadd/fmul` are generated for both `MaxVectorSize` of 8B and 16B for add reduction ... Yes, if it is indeed a bug that can be reproduced we should fix it separately. We just had the fork from JDK26 to JDK27, and so the bug fix probably needs to be backported. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3642076002 From rcastanedalo at openjdk.org Thu Dec 11 14:38:45 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 11 Dec 2025 14:38:45 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v5] In-Reply-To: References: Message-ID: <3u22nbDF6M3weBExSGGYgGvisgMIeqDC4cLN1kIH-m8=.17ee76d4-1652-470f-8b52-af72091b9a95@github.com> On Fri, 5 Dec 2025 13:48:50 GMT, Roland Westrelin wrote: >> The test case has an out of loop `Store` with an `AddP` address >> expression that has other uses and is in the loop body. Schematically, >> only showing the address subgraph and the bases for the `AddP`s: >> >> >> Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> CastPP#110 >> >> >> Both `AddP`s have the same base, a `CastPP` that's also in the loop >> body. >> >> That loop is a counted loop and only has 3 iterations so is fully >> unrolled. First, one iteration is peeled: >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> The `AddP`s and `CastPP` are cloned (because in the loop body). As >> part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is >> called. It finds the test that guards `CastPP#283` in the peeled >> iteration dominates and replaces the test that guards `CastPP#110` >> (the test in the peeled iteration is the clone of the test in the >> loop). That causes `CastPP#110`'s control to be updated to that of the >> test in the peeled iteration and to be yanked from the loop. So now >> `CastPP#283` and `CastPP#110` have the same inputs. >> >> Next unrolling happens: >> >> >> /-> CastPP#110 >> /-> AddP#400 -> AddP#401 -> CastPP#110 >> Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 >> \ -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> `AddP`s are cloned once more but not the `CastPP`s because they are >> both in the peeled iteration now. A new `Phi` is added. >> >> Next igvn runs. It's going to push the `AddP`s through the `Phi`s. >> >> Through `Phi#477`: >> >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 >> \ -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> Through `Phi#360`: >> >> >> /-> AddP#134 -> CastPP#110 >> /-> Phi#509 -> AddP#401 -> CastPP#110 >> Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 >> -> Phi#514 -> CastPP#283 >> ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for the thorough analysis and fix, Roland! I agree this is the best way to go at the moment. The invariant that AddP chains share the exact same base address node seems useful, as you mention above, to reduce the number of cases to think about. However, if we find in the future that this invariant becomes too difficult to maintain, we might want to consider relaxing it to "AddP chains have the same *uncast* base address". I guess this is the relaxation that @dean-long suggested above, and I believe it would preserve the spirit of the original assertion added to `Compile::final_graph_reshaping_main_switch()` by @rose00 many years ago when implementing the CastX2P/CastP2X nodes (perhaps to verify chains of AddP nodes induced by `CastX2PNode::Ideal` transformations?). I'm running some internal testing, will come back with results. src/hotspot/share/opto/phaseX.cpp line 2064: > 2062: } > 2063: > 2064: // Some other verifications that are no specific to a particular transformation Suggestion: // Some other verifications that are not specific to a particular transformation. src/hotspot/share/opto/phaseX.cpp line 2065: > 2063: > 2064: // Some other verifications that are no specific to a particular transformation > 2065: bool PhaseIterGVN::verify_node_invariants_for(Node* n) { Suggestion: bool PhaseIterGVN::verify_node_invariants_for(const Node* n) { src/hotspot/share/opto/phaseX.cpp line 2070: > 2068: if (addp->is_AddP() && > 2069: !addp->in(AddPNode::Base)->is_top() && > 2070: addp->in(AddPNode::Base) != n->in(AddPNode::Base)) { Any way we could avoid this code duplication with the same check in `Compile::final_graph_reshaping_main_switch`? src/hotspot/share/opto/phaseX.hpp line 496: > 494: bool verify_Ideal_for(Node* n, bool can_reshape); > 495: bool verify_Identity_for(Node* n); > 496: bool verify_node_invariants_for(Node* n); Suggestion: bool verify_node_invariants_for(const Node* n); ------------- PR Review: https://git.openjdk.org/jdk/pull/25386#pullrequestreview-3567683298 PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2610782593 PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2610784474 PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2610807655 PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2610790721 From aph at openjdk.org Thu Dec 11 15:11:35 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 11 Dec 2025 15:11:35 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 23:06:35 GMT, Chad Rakoczy wrote: > ### Summary > This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. > > ### Description > The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. > > Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. > > The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. > > The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). > > Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. > > ### Performance > Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. > > ### Testing > * CodeCache tests have been updated to cover the new `HotCodeHeap`. > * Dedicated tests for the `HotCodeGrouper` will be ... Thanks. I need to stress test this code, especially by moving nmethods as much as possible while many threads are executing. Is one of the stress tests here suitable for that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27858#issuecomment-3642374309 From chagedorn at openjdk.org Thu Dec 11 15:36:47 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 Dec 2025 15:36:47 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel [v4] In-Reply-To: References: Message-ID: On Wed, 5 Nov 2025 16:36:26 GMT, Damon Fenacci wrote: >> ## Issue >> Today, the only practical ways to run IR Framework scenarios in parallel seems to be: >> * spawning threads manually in a single test, or >> * letting jtreg treat each scenario as a separate test (the only way to potentially distribute across hosts). >> >> This makes it a bit cumbersome to use host CPU cores efficiently when running multiple scenarios within the same test. >> >> ## Change >> This change introduces a method `TestFramework::startParallel` to execute multiple scenarios concurrently. The implementation: >> * launches one task per scenario and runs them in parallel (by default, the maximum concurrency should match the host?s available cores) >> * captures each task?s `System.out` into a dedicated buffer and flushes it when the task completes to avoid interleaved output between scenarios (Note: only call paths within the `compile.lib.ir_framework` package are modified to per-task output streams. `ProcessTools` methods still write directly to `stdout`, so their output may interleave). >> * adds an option `-DForceSequentialScenarios=true` to force all scenarios to be run sequentially. >> >> ## Testing >> * Tier 1-3+ >> * explicit `ir_framework.tests` runs >> * added IR-Framework test `TestDForceSequentialScenarios.java` to test forcing sequential testing (checkin the output order) and added a parallel run to `TestScenatios.java` (as well as adding `ForceSequentialScenarios` flag to `TestDFlags.java`) >> >> As reference: a comparison of the execution time between sequential and parallel of all IR-Framework tests using scenarios on our machines (linux x64/aarch64, macosx x64/aarch64, windows x64 with different number of cores, so the results for a single test might not be relevant) gave me an average speedup of 1.9. > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8370315: fix typo Sorry, I dropped the ball on this - thanks for the updates! Some more comments but then I think it looks good! test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 433: > 431: * Start the testing of the implicitly (by {@link #TestFramework()}) or explicitly (by {@link #TestFramework(Class)}) > 432: * set test class. Scenarios are run in parallel. Note: scenarios could still be run sequentially if flag > 433: * {@code -DForceSequentialScenarios=true} is given. Suggestion: * {@code -DForceSequentialScenarios=true} is used. test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 749: > 747: * scenarios without prematurely throwing an exception. Format violations, however, are wrong for all scenarios > 748: * and thus is reported immediately on the first scenario execution. > 749: * @param parallel Run tests concurrently Suggestion: * * @param parallel Run tests concurrently test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 753: > 751: private void startWithScenarios(boolean parallel) { > 752: Map exceptionMap = new ConcurrentSkipListMap<>(Comparator.comparingInt(Scenario::getIndex)); > 753: record Outcome(Scenario scenario, Exception other) {} Can now be removed Suggestion: test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 775: > 773: if (!output.isEmpty()) { > 774: System.out.println(output); > 775: } We probably also need to do a similar trick as for the exceptions in order to have ordered stdouts for the scenarios? test/hotspot/jtreg/compiler/lib/ir_framework/shared/TestFormat.java line 33: > 31: */ > 32: public class TestFormat { > 33: private static final ThreadLocal> threadLocalFailures = ThreadLocal.withInitial(ArrayList::new); Suggestion: private static final ThreadLocal> threadLocalFailures = ThreadLocal.withInitial(ArrayList::new); test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java line 128: > 126: private static void expectTestFormatException(Class clazz, Class... helpers) { > 127: // Single test > 128: boolean exceptionCatched = false; Nit: catched -> caught test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenarios.java line 52: > 50: public class TestScenarios { > 51: public static void main(String[] args) { > 52: TestFramework testFramework; Seems unused Suggestion: ------------- PR Review: https://git.openjdk.org/jdk/pull/28065#pullrequestreview-3567845856 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2610902845 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2610908883 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2610906921 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2611008476 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2611036168 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2611048022 PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2610967451 From roland at openjdk.org Thu Dec 11 15:41:55 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Dec 2025 15:41:55 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses Message-ID: The base input of `AddP` is expected to only be set for heap accesses but I noticed some inconsistencies so I added an assert in the `AddP` constructor and fixed issues that it caught. AFAFICT, the inconsistencies shouldn't create issues. ------------- Commit messages: - more - more - more - undo - exps Changes: https://git.openjdk.org/jdk/pull/28769/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28769&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373343 Stats: 80 lines in 14 files changed: 21 ins; 7 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/28769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28769/head:pull/28769 PR: https://git.openjdk.org/jdk/pull/28769 From roland at openjdk.org Thu Dec 11 15:42:42 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Dec 2025 15:42:42 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v6] In-Reply-To: References: Message-ID: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: package declaration ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28581/files - new: https://git.openjdk.org/jdk/pull/28581/files/8411ff3d..1c040156 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=04-05 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28581/head:pull/28581 PR: https://git.openjdk.org/jdk/pull/28581 From roland at openjdk.org Thu Dec 11 15:46:40 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Dec 2025 15:46:40 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: References: Message-ID: > The crash occurs because verification code expects the inner and outer > loop of a loop strip mining nest to have the same number of phis but, > in this case, the inner loop has one more memory phis than the outer > loop. > > 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and > outer loops have the same number of phis, as expected. > > > 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] > 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > > 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 > through the outer loop phi: > > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTe... Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/opto/node.cpp Co-authored-by: Daniel Lund?n - Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java Co-authored-by: Daniel Lund?n - Update src/hotspot/share/opto/cfgnode.cpp Co-authored-by: Daniel Lund?n ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28677/files - new: https://git.openjdk.org/jdk/pull/28677/files/6d3109c7..24a30b44 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28677&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28677&range=02-03 Stats: 6 lines in 3 files changed: 0 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28677/head:pull/28677 PR: https://git.openjdk.org/jdk/pull/28677 From roland at openjdk.org Thu Dec 11 15:47:50 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Dec 2025 15:47:50 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v6] In-Reply-To: References: Message-ID: > The test case has an out of loop `Store` with an `AddP` address > expression that has other uses and is in the loop body. Schematically, > only showing the address subgraph and the bases for the `AddP`s: > > > Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 > -> CastPP#110 > > > Both `AddP`s have the same base, a `CastPP` that's also in the loop > body. > > That loop is a counted loop and only has 3 iterations so is fully > unrolled. First, one iteration is peeled: > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > The `AddP`s and `CastPP` are cloned (because in the loop body). As > part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is > called. It finds the test that guards `CastPP#283` in the peeled > iteration dominates and replaces the test that guards `CastPP#110` > (the test in the peeled iteration is the clone of the test in the > loop). That causes `CastPP#110`'s control to be updated to that of the > test in the peeled iteration and to be yanked from the loop. So now > `CastPP#283` and `CastPP#110` have the same inputs. > > Next unrolling happens: > > > /-> CastPP#110 > /-> AddP#400 -> AddP#401 -> CastPP#110 > Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 > \ -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > `AddP`s are cloned once more but not the `CastPP`s because they are > both in the peeled iteration now. A new `Phi` is added. > > Next igvn runs. It's going to push the `AddP`s through the `Phi`s. > > Through `Phi#477`: > > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 > \ -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > Through `Phi#360`: > > > /-> AddP#134 -> CastPP#110 > /-> Phi#509 -> AddP#401 -> CastPP#110 > Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 > -> Phi#514 -> CastPP#283 > -> CastP#110 > > > Then `Phi#514` which has 2 `CastPP`s as input with identical inputs is > transformed into anot... Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/phaseX.hpp Co-authored-by: Roberto Casta?eda Lozano - Update src/hotspot/share/opto/phaseX.cpp Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25386/files - new: https://git.openjdk.org/jdk/pull/25386/files/20154a12..d2174c88 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=04-05 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25386/head:pull/25386 PR: https://git.openjdk.org/jdk/pull/25386 From roland at openjdk.org Thu Dec 11 16:10:58 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Dec 2025 16:10:58 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v7] In-Reply-To: References: Message-ID: > The test case has an out of loop `Store` with an `AddP` address > expression that has other uses and is in the loop body. Schematically, > only showing the address subgraph and the bases for the `AddP`s: > > > Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 > -> CastPP#110 > > > Both `AddP`s have the same base, a `CastPP` that's also in the loop > body. > > That loop is a counted loop and only has 3 iterations so is fully > unrolled. First, one iteration is peeled: > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > The `AddP`s and `CastPP` are cloned (because in the loop body). As > part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is > called. It finds the test that guards `CastPP#283` in the peeled > iteration dominates and replaces the test that guards `CastPP#110` > (the test in the peeled iteration is the clone of the test in the > loop). That causes `CastPP#110`'s control to be updated to that of the > test in the peeled iteration and to be yanked from the loop. So now > `CastPP#283` and `CastPP#110` have the same inputs. > > Next unrolling happens: > > > /-> CastPP#110 > /-> AddP#400 -> AddP#401 -> CastPP#110 > Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 > \ -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > `AddP`s are cloned once more but not the `CastPP`s because they are > both in the peeled iteration now. A new `Phi` is added. > > Next igvn runs. It's going to push the `AddP`s through the `Phi`s. > > Through `Phi#477`: > > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 > \ -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > Through `Phi#360`: > > > /-> AddP#134 -> CastPP#110 > /-> Phi#509 -> AddP#401 -> CastPP#110 > Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 > -> Phi#514 -> CastPP#283 > -> CastP#110 > > > Then `Phi#514` which has 2 `CastPP`s as input with identical inputs is > transformed into anot... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: - review - Merge branch 'master' into JDK-8351889 - Update src/hotspot/share/opto/phaseX.hpp Co-authored-by: Roberto Casta?eda Lozano - Update src/hotspot/share/opto/phaseX.cpp Co-authored-by: Roberto Casta?eda Lozano - review - more - review - Merge branch 'master' into JDK-8351889 - exp - Merge branch 'master' into JDK-8351889 - ... and 9 more: https://git.openjdk.org/jdk/compare/830e7075...100fad3d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25386/files - new: https://git.openjdk.org/jdk/pull/25386/files/d2174c88..100fad3d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=05-06 Stats: 45025 lines in 629 files changed: 27805 ins; 14178 del; 3042 mod Patch: https://git.openjdk.org/jdk/pull/25386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25386/head:pull/25386 PR: https://git.openjdk.org/jdk/pull/25386 From roland at openjdk.org Thu Dec 11 16:11:02 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Dec 2025 16:11:02 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v5] In-Reply-To: <3u22nbDF6M3weBExSGGYgGvisgMIeqDC4cLN1kIH-m8=.17ee76d4-1652-470f-8b52-af72091b9a95@github.com> References: <3u22nbDF6M3weBExSGGYgGvisgMIeqDC4cLN1kIH-m8=.17ee76d4-1652-470f-8b52-af72091b9a95@github.com> Message-ID: On Thu, 11 Dec 2025 14:29:16 GMT, Roberto Casta?eda Lozano wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/phaseX.cpp line 2070: > >> 2068: if (addp->is_AddP() && >> 2069: !addp->in(AddPNode::Base)->is_top() && >> 2070: addp->in(AddPNode::Base) != n->in(AddPNode::Base)) { > > Any way we could avoid this code duplication with the same check in `Compile::final_graph_reshaping_main_switch`? I pushed a new commit. Is it what you had in mind? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2611166125 From galder at openjdk.org Thu Dec 11 16:16:21 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 11 Dec 2025 16:16:21 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations Message-ID: `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?): // MinINode::Ideal // Did not investigate, but there are some patterns that might // need more notification. case Op_MinI: case Op_MaxI: // preemptively removed it as well. return false; I've run tier1-3 tests on linux/x64 and they passed. ------------- Commit messages: - Test that MinI/MinL optimizations apply as expected - Call AddNode::Ideal in MinINode::Ideal to get optimizations - Call AddNode::Ideal in MaxINode::Ideal to get optimizations Changes: https://git.openjdk.org/jdk/pull/28770/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28770&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373396 Stats: 156 lines in 2 files changed: 156 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28770/head:pull/28770 PR: https://git.openjdk.org/jdk/pull/28770 From dfenacci at openjdk.org Thu Dec 11 16:38:02 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 11 Dec 2025 16:38:02 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: References: Message-ID: <1DHLaPLG3Gbzlhb2QsG_DsO1VGambMOVgKBs7rkyT24=.3f5b2a2c-9662-4534-bd8e-b64646c2a92f@github.com> On Thu, 11 Dec 2025 09:48:02 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/cfgnode.cpp line 2753: >> >>> 2751: >>> 2752: bool PhiNode::can_be_replaced_by(const PhiNode* other) const { >>> 2753: return type() == Type::MEMORY && other->type() == Type::MEMORY && adr_type() != TypePtr::BOTTOM && >> >> I think I might miss something but I was wondering if we strictly need to check for `adr_type() != TypePtr::BOTTOM` > > Are you suggesting we could do: > > > bool PhiNode::can_be_replaced_by(const PhiNode* other) const { > return type() == Type::MEMORY && other->type() == Type::MEMORY && other->adr_type() == TypePtr::BOTTOM && has_same_inputs_as(other); > } > > > ? > > If there are 2 memory `Phi`s with same inputs and same `adr_type` then global value numbering should common them so that would make no difference. Yes, that's what I was thinking (to have one less operand). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2611285260 From dfenacci at openjdk.org Thu Dec 11 16:43:43 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 11 Dec 2025 16:43:43 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: References: Message-ID: <1HRzwNdoAoy_MP40qRTR5sbniOuWUw_4__LzcInDJjM=.70f5ef64-50f7-4536-b529-46b00b73b819@github.com> On Thu, 11 Dec 2025 15:46:40 GMT, Roland Westrelin wrote: >> The crash occurs because verification code expects the inner and outer >> loop of a loop strip mining nest to have the same number of phis but, >> in this case, the inner loop has one more memory phis than the outer >> loop. >> >> 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and >> outer loops have the same number of phis, as expected. >> >> >> 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] >> 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> >> 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 >> through the outer loop phi: >> >> >> 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] >> 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx... > > Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/opto/node.cpp > > Co-authored-by: Daniel Lund?n > - Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java > > Co-authored-by: Daniel Lund?n > - Update src/hotspot/share/opto/cfgnode.cpp > > Co-authored-by: Daniel Lund?n Looks good to me. Thank you @rwestrel! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/28677#pullrequestreview-3568347698 From eastigeevich at openjdk.org Thu Dec 11 17:30:19 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 11 Dec 2025 17:30:19 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 23:06:35 GMT, Chad Rakoczy wrote: > ### Summary > This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. > > ### Description > The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. > > Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. > > The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. > > The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). > > Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. > > ### Performance > Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. > > ### Testing > * CodeCache tests have been updated to cover the new `HotCodeHeap`. > * Dedicated tests for the `HotCodeGrouper` will be ... src/hotspot/share/runtime/hotCodeGrouper.cpp line 77: > 75: void HotCodeGrouper::run() { > 76: while (true) { > 77: os::naked_sleep(HotCodeIntervalSeconds * 1000); I see you have switched to `sleep` to control HotCodeGrouper. Is it better than wait/notify when HotCodeGrouper thread waits for a number of new nmethods exceeding a threshold? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27858#discussion_r2611452734 From eastigeevich at openjdk.org Thu Dec 11 17:45:58 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 11 Dec 2025 17:45:58 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 23:06:35 GMT, Chad Rakoczy wrote: > ### Summary > This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. > > ### Description > The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. > > Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. > > The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. > > The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). > > Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. > > ### Performance > Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. > > ### Testing > * CodeCache tests have been updated to cover the new `HotCodeHeap`. > * Dedicated tests for the `HotCodeGrouper` will be ... src/hotspot/share/runtime/hotCodeSampler.cpp line 41: > 39: uint64_t start_time = os::javaTimeMillis(); > 40: > 41: while (true) { Should we skip sampling if we are at a safepoint? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27858#discussion_r2611501217 From phh at openjdk.org Thu Dec 11 18:22:46 2025 From: phh at openjdk.org (Paul Hohensee) Date: Thu, 11 Dec 2025 18:22:46 GMT Subject: RFR: 8373428: Refine variables with the same name in nested scopes in PhaseChaitin::gather_lrg_masks In-Reply-To: <8WWg7y_W2PGKAkwrVUfN97dBZ56I2MRvbMuxowqmnZE=.4c238198-0b07-47da-8756-1485846f044f@github.com> References: <8WWg7y_W2PGKAkwrVUfN97dBZ56I2MRvbMuxowqmnZE=.4c238198-0b07-47da-8756-1485846f044f@github.com> Message-ID: On Wed, 10 Dec 2025 14:43:04 GMT, Hamlin Li wrote: > Hi, > Can you help to review this trivial patch? > > In PhaseChaitin::gather_lrg_masks, several variables have the same name in nested scopes, it looks like following code snippet. > { > A a; > { > A a; > } > } > > This is not helpful to code readability, in particular in a long method like `gather_lrg_masks`, better to rename them. > > Thanks! Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28748#pullrequestreview-3568736004 From kvn at openjdk.org Thu Dec 11 18:29:43 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Dec 2025 18:29:43 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache In-Reply-To: References: Message-ID: On Thu, 16 Oct 2025 23:06:35 GMT, Chad Rakoczy wrote: > ### Summary > This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. > > ### Description > The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. > > Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. > > The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. > > The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). > > Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. > > ### Performance > Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. > > ### Testing > * CodeCache tests have been updated to cover the new `HotCodeHeap`. > * Dedicated tests for the `HotCodeGrouper` will be ... I am planning to review that but I currently don't have time. I will look on it later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27858#issuecomment-3643234641 From kvn at openjdk.org Thu Dec 11 18:47:24 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Dec 2025 18:47:24 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v4] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Thu, 11 Dec 2025 13:38:24 GMT, Quan Anh Mai wrote: >> Hi, >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. >> >> For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > store values need normalizing src/hotspot/share/opto/phaseloadfolding.cpp line 133: > 131: // int x = o.value; > 132: // In this case, even if the load x = o.value is declared after the store of o to p that allows o > 133: // to escape, it is valid for the load to actually happen before the store. As a result, we can I don't think it is correct. If `p` is external other thread can modify its fields concurrently. Or are you saying that if `p` is external we will always have memory barrier? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28764#discussion_r2611684108 From kvn at openjdk.org Thu Dec 11 18:53:31 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Dec 2025 18:53:31 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account [v3] In-Reply-To: <5mMYhBudpEt7JDkC-EkGba0GGZR-kZ9LH-jh5m-W7OY=.f3967aa3-b847-42db-99b1-4492b0d78c7c@github.com> References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> <5mMYhBudpEt7JDkC-EkGba0GGZR-kZ9LH-jh5m-W7OY=.f3967aa3-b847-42db-99b1-4492b0d78c7c@github.com> Message-ID: On Thu, 11 Dec 2025 07:59:50 GMT, Emanuel Peter wrote: >> **Summary** >> >> I created some `fill` and `copy` style benchmarks, covering both `arrays` and `MemorySegment`s. >> Reasons for this benchmark: >> - I want to compare auto-vectorization with intrinsics (array assembly style intrinsics, and MemorySegment java level special implementations). This allows us to see if some are slower than others, and if we can manage to improve the slower versions somehow in the future. >> - There are some known issues we can demonstrate well with this benchmark: >> - Super-Unrolling: unrolling the vectoirzed loop gets us extra performance, but the exact factor may not be optimal yet for auto-vectorization. >> - Small iteration count loops: auto-vectorization can lead to slowdowns. >> - Many benchmarks do not control for alignment. But that creates noise. I just go over all possible alignments, that should smooth out the noise. >> - Most benchmarks do not control for 4k aliasing (x86 effect in store buffer). I make sure that load/stores are not a multiple of 4k bytes apart, so we can avoid the noise of that effect. >> >> ---------------------------------------------------------------------- >> >> **Analysis based on this Benchmark** >> >> Analysis done in this PR: >> - Arrays: auto vectorization vs scalar loops performance >> - Arrays: auto vectorization loops vs intrinsics >> - MemorySegments: auto vectorization loops vs scalar loops vs `MemorySegment.fill/copy` >> >> Future work: >> - Investigate deeper, inspect assembly, etc. >> - Impact of `-XX:SuperWordAutomaticAlignment=0` on small iteration count loops. >> - Investigate effect of `-XX:-OptimizeFill`. It seems that the loops in this benchmark are not detected automatically, and so the array intrinsics are not used. Why? >> - Investigate impact of `CompactObjectHeaders`. Does enabling/disabling change any performance? >> - Investigate if adjusting the super-unrolling factor could improve performance for auto-vectorization: [JDK-8368061](https://bugs.openjdk.org/browse/JDK-8368061) >> - Performance comparison with Graal. >> >> ---------------------------------------------------------------------- >> >> **Array Benchmark: auto vectorization vs scalar** >> >> We can see that for arrays, auto vectorization leads to minor regressions for sizes 1-32, and then generally auto vectorization is faster for larger sizes. And this is true for both `fill` and `copy`. >> >> Strange: `macosx_aarch64` with `copy_int`. The auto vectoirized performance has a sudden drop around 150 iterations. Also for `fill_... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > for merykitty "Rubber stamp" to give second review. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27315#pullrequestreview-3568850834 From rcastanedalo at openjdk.org Thu Dec 11 19:55:35 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 11 Dec 2025 19:55:35 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v4] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Thu, 11 Dec 2025 18:44:47 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> store values need normalizing > > src/hotspot/share/opto/phaseloadfolding.cpp line 133: > >> 131: // int x = o.value; >> 132: // In this case, even if the load x = o.value is declared after the store of o to p that allows o >> 133: // to escape, it is valid for the load to actually happen before the store. As a result, we can > > I don't think it is correct. If `p` is external other thread can modify its fields concurrently. > Or are you saying that if `p` is external we will always have memory barrier? I think the Java memory model allows this reordering and places the responsibility on the programmer to use a synchronization mechanism if the reordering is undesirable, no? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28764#discussion_r2611878319 From duke at openjdk.org Thu Dec 11 20:57:30 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 11 Dec 2025 20:57:30 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 17:43:07 GMT, Evgeny Astigeevich wrote: >> ### Summary >> This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. >> >> ### Description >> The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. >> >> Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. >> >> The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. >> >> The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). >> >> Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. >> >> ### Performance >> Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. >> >> ### Testing >> * CodeCache tests have been updated to cover the new `HotCodeHeap`. >> * Dedicated... > > src/hotspot/share/runtime/hotCodeSampler.cpp line 41: > >> 39: uint64_t start_time = os::javaTimeMillis(); >> 40: >> 41: while (true) { > > Should we skip sampling if we are at a safepoint? I don't think we should stop sampling but we could add a check and `continue` if we are at a safepoint. Ultimately I'm not sure it would make a huge difference because safepoints are usually relatively short but I think it's worth adding. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27858#discussion_r2612036943 From duke at openjdk.org Thu Dec 11 21:08:29 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 11 Dec 2025 21:08:29 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 17:27:24 GMT, Evgeny Astigeevich wrote: >> ### Summary >> This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. >> >> ### Description >> The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. >> >> Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. >> >> The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. >> >> The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). >> >> Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. >> >> ### Performance >> Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. >> >> ### Testing >> * CodeCache tests have been updated to cover the new `HotCodeHeap`. >> * Dedicated... > > src/hotspot/share/runtime/hotCodeGrouper.cpp line 77: > >> 75: void HotCodeGrouper::run() { >> 76: while (true) { >> 77: os::naked_sleep(HotCodeIntervalSeconds * 1000); > > I see you have switched to `sleep` to control HotCodeGrouper. Is it better than wait/notify when HotCodeGrouper thread waits for a number of new nmethods exceeding a threshold? My concern with wait/notify is when the application is in a steady state and no longer compiling many methods. If a hot method is registered shortly after we finish sampling we could wait a long time before we actually profile again and relocate it. Maybe there's a middle ground? When a new nmethod is registered starting sampling if (1) we are above the "new C2 threshold" or (2) it has be X minutes since the last sampling ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27858#discussion_r2612062778 From kvn at openjdk.org Thu Dec 11 21:28:06 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Dec 2025 21:28:06 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v4] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Thu, 11 Dec 2025 13:38:24 GMT, Quan Anh Mai wrote: >> Hi, >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. >> >> For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > store values need normalizing src/hotspot/share/opto/phaseloadfolding.cpp line 94: > 92: // We can see that the object can be considered non-escape at NarrowMemProj, CallJava(null), and > 93: // Proj2, while it is considered escape at CallJava(o), Proj1, Phi. The loads x and z will be > 94: // from NarrowMemProj and Proj2, respectively, which means they can be considered loads from an So this optimization is based on JDK-8327963 changes which introduced NarrowMemProj. But I don't see you can for it in code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28764#discussion_r2612121903 From kvn at openjdk.org Thu Dec 11 21:28:09 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Dec 2025 21:28:09 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v4] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Thu, 11 Dec 2025 19:52:37 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/opto/phaseloadfolding.cpp line 133: >> >>> 131: // int x = o.value; >>> 132: // In this case, even if the load x = o.value is declared after the store of o to p that allows o >>> 133: // to escape, it is valid for the load to actually happen before the store. As a result, we can >> >> I don't think it is correct. If `p` is external other thread can modify its fields concurrently. >> Or are you saying that if `p` is external we will always have memory barrier? > > I think the Java memory model allows this reordering and places the responsibility on the programmer to use a synchronization mechanism if the reordering is undesirable, no? Yes, I think. May be we should add comment about that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28764#discussion_r2612117694 From duke at openjdk.org Fri Dec 12 00:13:32 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 12 Dec 2025 00:13:32 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v2] In-Reply-To: References: Message-ID: > ### Summary > This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. > > ### Description > The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. > > Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. > > The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. > > The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). > > Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. > > ### Performance > Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. > > ### Testing > * CodeCache tests have been updated to cover the new `HotCodeHeap`. > * Dedicated tests for the `HotCodeGrouper` will be ... Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Add StessHotCodeGrouper test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27858/files - new: https://git.openjdk.org/jdk/pull/27858/files/c4c779a5..ce1c685f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=00-01 Stats: 167 lines in 2 files changed: 166 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/27858.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27858/head:pull/27858 PR: https://git.openjdk.org/jdk/pull/27858 From missa at openjdk.org Fri Dec 12 00:15:50 2025 From: missa at openjdk.org (Mohamed Issa) Date: Fri, 12 Dec 2025 00:15:50 GMT Subject: RFR: 8368977: Provide clear naming for AVX10 identifiers [v2] In-Reply-To: References: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> Message-ID: On Mon, 8 Dec 2025 21:47:16 GMT, Mohamed Issa wrote: >> This is a simple change that renames all AVX10 identifiers to explicitly show which sub-versions are in use. Right now, AVX10.2 is the only case to worry about. The JTREG tests listed below were used to verify correctness with the recommended JVM options mentioned in corresponding source files. Each test included runs through emulation with AVX10.2 enabled and disabled to exercise all possible paths. All modifications and tests used [OpenJDK v26-b24](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B24) as the baseline build. >> >> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` >> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` >> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` >> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` >> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` >> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` >> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` >> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` >> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java` >> 10. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java` >> 11. `jtreg:test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java` >> 12. `jtreg:test/jdk/jdk/incubator/vector/Double64VectorTests.java` >> 13. `jtreg:test/jdk/jdk/incubator/vector/Double128VectorTests.java` >> 14. `jtreg:test/jdk/jdk/incubator/vector/Double256VectorTests.java` >> 15. `jtreg:test/jdk/jdk/incubator/vector/Double512VectorTests.java` >> 16. `jtreg:test/jdk/jdk/incubator/vector/DoubleMaxVectorTests.java` >> 17. `jtreg:test/jdk/jdk/incubator/vector/Float64VectorTests.java` >> 18. `jtreg:test/jdk/jdk/incubator/vector/Float128VectorTests.java` >> 19. `jtreg:test/jdk/jdk/incubator/vector/Float256VectorTests.java` >> 20. `jtreg:test/jdk/jdk/incubator/vector/Float512VectorTests.java` >> 21. `jtreg:test/jdk/jdk/incubator/vector/FloatMaxVectorTests.java` >> 22. `jtreg:test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` >> 23. `jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java` > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Remove changes that affect functionality @eme64, @mhaessig Ok to test? @iwanowww In case you want to glance through changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28344#issuecomment-3644338061 From dlong at openjdk.org Fri Dec 12 01:58:15 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 12 Dec 2025 01:58:15 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v6] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 21:27:22 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: > > - Improve the test > - Improve comments Why the reversal on should_delay_inlining unification? Did it cause problems? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28517#issuecomment-3644562168 From haosun at openjdk.org Fri Dec 12 02:11:46 2025 From: haosun at openjdk.org (Hao Sun) Date: Fri, 12 Dec 2025 02:11:46 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v3] In-Reply-To: <9ADykpxVRUnyvfsx-PHVd9Jfnssl7b-EX433Nt8RPd8=.da6fa1dd-5b67-46af-b106-10caa8976b98@github.com> References: <9ADykpxVRUnyvfsx-PHVd9Jfnssl7b-EX433Nt8RPd8=.da6fa1dd-5b67-46af-b106-10caa8976b98@github.com> Message-ID: On Thu, 11 Dec 2025 13:16:12 GMT, Ramkumar Sunderbabu wrote: >> Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix TestUseSHA3IntrinsicsOptionOnSupportedCPU > > There seems to be a bug in WhiteBox call to get intrinsics support for SHA3 in x64 hosts. I need more time for investigating the issue. > I have 2 options, > 1. Wait for clarity on SHA3 > 2. Complete this bug at the current form and work on SHA3 in a separate bug. > > @shqking could you please suggest? @rsunderbabu I would suggest option 1. I suppose MD5, SHA?1, SHA?256, and SHA?3 should follow the similar test logic. I personally think it would be safer not to change the MD5/SHA?1/SHA?256 part until we fully understand the cause of the SHA?3 failure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3644592410 From vlivanov at openjdk.org Fri Dec 12 02:25:54 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 12 Dec 2025 02:25:54 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v6] In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 01:55:19 GMT, Dean Long wrote: > Why the reversal on should_delay_inlining unification? Did it cause problems? Yes, there were some assertion failures observed during testing. I thought that it is an equivalent change, but it turned out it's not. So, I reverted it. I plan to look into it separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28517#issuecomment-3644622447 From qamai at openjdk.org Fri Dec 12 02:30:22 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Dec 2025 02:30:22 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v5] In-Reply-To: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: > Hi, > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. > > For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: more detailed explanations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28764/files - new: https://git.openjdk.org/jdk/pull/28764/files/045cd4ab..2ca6bac7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28764&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28764&range=03-04 Stats: 28 lines in 1 file changed: 22 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/28764.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28764/head:pull/28764 PR: https://git.openjdk.org/jdk/pull/28764 From qamai at openjdk.org Fri Dec 12 02:30:23 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Dec 2025 02:30:23 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v4] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Thu, 11 Dec 2025 21:23:22 GMT, Vladimir Kozlov wrote: >> I think the Java memory model allows this reordering and places the responsibility on the programmer to use a synchronization mechanism if the reordering is undesirable, no? > > Yes, I think. May be we should add comment about that. I have added comments to further stress the importance of memory barriers if the developer needs the accesses serialized. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28764#discussion_r2612660563 From qamai at openjdk.org Fri Dec 12 02:32:55 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Dec 2025 02:32:55 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v4] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Thu, 11 Dec 2025 21:25:11 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> store values need normalizing > > src/hotspot/share/opto/phaseloadfolding.cpp line 94: > >> 92: // We can see that the object can be considered non-escape at NarrowMemProj, CallJava(null), and >> 93: // Proj2, while it is considered escape at CallJava(o), Proj1, Phi. The loads x and z will be >> 94: // from NarrowMemProj and Proj2, respectively, which means they can be considered loads from an > > So this optimization is based on JDK-8327963 changes which introduced NarrowMemProj. But I don't see you can for it in code. This is only for demonstration based on the current shape of the graph. Implementation-wise, we walk the graph until we meet an `InitializeNode`, at that point we call `InitializeNode::find_captured_store`, so you can say it is not important what kind of `Proj` an `InitializeNode` has. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28764#discussion_r2612666993 From vlivanov at openjdk.org Fri Dec 12 02:46:54 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 12 Dec 2025 02:46:54 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v5] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Fri, 12 Dec 2025 02:30:22 GMT, Quan Anh Mai wrote: >> Hi, >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. >> >> For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > more detailed explanations Interesting idea, Quan! Why can't the same be done as part of `MemNode::can_see_stored_value()`? Based on your reasoning in the comments, if the base escapes, the walk over memory graph happening there should encounter it as well. (But you need to ensure it climbs up to the Allocation node to be sure.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3644664178 From qamai at openjdk.org Fri Dec 12 02:53:51 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Dec 2025 02:53:51 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v5] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Fri, 12 Dec 2025 02:43:46 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> more detailed explanations > > Interesting idea, Quan! > > Why can't the same be done as part of `MemNode::can_see_stored_value()`? Based on your reasoning in the comments, if the base escapes, the walk over memory graph happening there should encounter it as well. (But you need to ensure it climbs up to the Allocation node to be sure.) @iwanowww No, the walk over the memory graph only visits the memory nodes in the alias class of the load, the escape can happen in a different alias class and be made visible by a `MergeMem`. For example: Integer o = new Integer(v); *p = o; VarHandle.fullFence(); int x = o.value; Then the load is in the alias class `Integer.value`, while the escape is in the alias class `p` and is made visible by the full fence. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3644678568 From vlivanov at openjdk.org Fri Dec 12 03:06:59 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 12 Dec 2025 03:06:59 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v5] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Fri, 12 Dec 2025 02:30:22 GMT, Quan Anh Mai wrote: >> Hi, >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. >> >> For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > more detailed explanations Yes, I got it, but my understanding of the core idea of the optimization is that you can skip over membars when base object is not escaped yet. So, if `MemNode::can_see_stored_value()` encounters a node with a wide memory effect (a membar or a call) while traversing the memory graph upwards, it can step over it if it can prove that the freshly allocated instance hasn't escaped yet. And the traversal of memory graph from there up to corresponding Initialize node should reveal whether the instance escaped or not. Do I get it right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3644703975 From qamai at openjdk.org Fri Dec 12 03:18:50 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Dec 2025 03:18:50 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v5] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Fri, 12 Dec 2025 03:03:59 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> more detailed explanations > > Yes, I got it, but my understanding of the core idea of the optimization is that you can skip over membars when base object is not escaped yet. So, if `MemNode::can_see_stored_value()` encounters a node with a wide memory effect (a membar or a call) while traversing the memory graph upwards, it can step over it if it can prove that the freshly allocated instance hasn't escaped yet. And the traversal of memory graph from there up to corresponding Initialize node should reveal whether the instance escaped or not. Do I get it right? @iwanowww In principle, I think you are right. However, I don't know how you can prove that a freshly allocated object has not escaped. It seems to me you would need to traverse the whole memory graph to obtain that information. Furthermore, `AllocateNode` and `InitializeNode` do not write to the whole memory, so walking some memory alias classes will make you pass them without knowing until you encounter the start memory. And there is also the issue of other nodes may alias with the base, too (e.g. `Phi`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3644728073 From vlivanov at openjdk.org Fri Dec 12 03:35:53 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 12 Dec 2025 03:35:53 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v5] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Fri, 12 Dec 2025 02:30:22 GMT, Quan Anh Mai wrote: >> Hi, >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. >> >> For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > more detailed explanations Ok, what does it take to determine a freshly allocated object doesn't escape in a region bounded by the allocation and some call/membar node (dominated by it)? I believe it should be part of the problem your patch solves. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3644760709 From qamai at openjdk.org Fri Dec 12 03:59:50 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Dec 2025 03:59:50 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v5] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Fri, 12 Dec 2025 02:30:22 GMT, Quan Anh Mai wrote: >> Hi, >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. >> >> For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > more detailed explanations The sufficient condition to decide that a freshly allocated object does not escape in a region bounded by the allocation and a call is that there is no action in that region that makes the object escape. This means that there is no node that escapes the object which has the call as a transitive use. As a result, my solution here is to find all nodes that escape the object, then mark all of its transitive uses as escape. I believe you want to do it in the opposite way, that is, to try to find the nodes that escape the freshly allocated object from a call. But that means that we need to traverse all the transitive inputs of the call, which seems unrealistic for something running in `IterGVN`. Am I understanding it correctly? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3644799880 From qamai at openjdk.org Fri Dec 12 05:13:11 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Dec 2025 05:13:11 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v6] In-Reply-To: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: > Hi, > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. > > For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into foldmem - grammar, safe change - more detailed explanations - store values need normalizing - Just use candidate_set directly - Some runtime calls may receive a derived pointer but not the base - Aggressively fold loads from objects that have not escaped ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28764/files - new: https://git.openjdk.org/jdk/pull/28764/files/2ca6bac7..6331d47c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28764&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28764&range=04-05 Stats: 25130 lines in 238 files changed: 16968 ins; 7138 del; 1024 mod Patch: https://git.openjdk.org/jdk/pull/28764.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28764/head:pull/28764 PR: https://git.openjdk.org/jdk/pull/28764 From thartmann at openjdk.org Fri Dec 12 06:56:53 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Dec 2025 06:56:53 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations In-Reply-To: References: Message-ID: <9pBouCjCdkv0Ba0emphWr57W8OKxJaUDHh0eaRdT894=.831501cc-7b5c-42ac-ae05-fe221180c3bb@github.com> On Thu, 11 Dec 2025 16:02:24 GMT, Galder Zamarre?o wrote: > `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. > > The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. > > Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. > > If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?): > > > // MinINode::Ideal > // Did not investigate, but there are some patterns that might > // need more notification. > case Op_MinI: > case Op_MaxI: // preemptively removed it as well. > return false; > > > I've run tier1-3 tests on linux/x64 and they passed. src/hotspot/share/opto/addnode.cpp line 1459: > 1457: // Ideal transformations for MaxINode > 1458: Node* MaxINode::Ideal(PhaseGVN* phase, bool can_reshape) { > 1459: Node* n = AddNode::Ideal(phase, can_reshape); Why not move this into `MaxNode::IdealI`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28770#discussion_r2613135180 From epeter at openjdk.org Fri Dec 12 07:20:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 07:20:07 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account [v3] In-Reply-To: <2oHQUsUXi7uHsxqgjogPpdB6LSHrvWrbZLyI6kfLdHE=.916375a0-ba94-42a7-bc1d-c18c79f82157@github.com> References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> <5mMYhBudpEt7JDkC-EkGba0GGZR-kZ9LH-jh5m-W7OY=.f3967aa3-b847-42db-99b1-4492b0d78c7c@github.com> <2oHQUsUXi7uHsxqgjogPpdB6LSHrvWrbZLyI6kfLdHE=.916375a0-ba94-42a7-bc1d-c18c79f82157@github.com> Message-ID: On Thu, 11 Dec 2025 08:16:33 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> for merykitty > > Thanks, LGTM @merykitty @vnkozlov Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27315#issuecomment-3645226798 From epeter at openjdk.org Fri Dec 12 07:20:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 07:20:09 GMT Subject: Integrated: 8367158: C2: create better fill and copy benchmarks, taking alignment into account In-Reply-To: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> Message-ID: On Tue, 16 Sep 2025 14:28:12 GMT, Emanuel Peter wrote: > **Summary** > > I created some `fill` and `copy` style benchmarks, covering both `arrays` and `MemorySegment`s. > Reasons for this benchmark: > - I want to compare auto-vectorization with intrinsics (array assembly style intrinsics, and MemorySegment java level special implementations). This allows us to see if some are slower than others, and if we can manage to improve the slower versions somehow in the future. > - There are some known issues we can demonstrate well with this benchmark: > - Super-Unrolling: unrolling the vectoirzed loop gets us extra performance, but the exact factor may not be optimal yet for auto-vectorization. > - Small iteration count loops: auto-vectorization can lead to slowdowns. > - Many benchmarks do not control for alignment. But that creates noise. I just go over all possible alignments, that should smooth out the noise. > - Most benchmarks do not control for 4k aliasing (x86 effect in store buffer). I make sure that load/stores are not a multiple of 4k bytes apart, so we can avoid the noise of that effect. > > ---------------------------------------------------------------------- > > **Analysis based on this Benchmark** > > Analysis done in this PR: > - Arrays: auto vectorization vs scalar loops performance > - Arrays: auto vectorization loops vs intrinsics > - MemorySegments: auto vectorization loops vs scalar loops vs `MemorySegment.fill/copy` > > Future work: > - Investigate deeper, inspect assembly, etc. > - Impact of `-XX:SuperWordAutomaticAlignment=0` on small iteration count loops. > - Investigate effect of `-XX:-OptimizeFill`. It seems that the loops in this benchmark are not detected automatically, and so the array intrinsics are not used. Why? > - Investigate impact of `CompactObjectHeaders`. Does enabling/disabling change any performance? > - Investigate if adjusting the super-unrolling factor could improve performance for auto-vectorization: [JDK-8368061](https://bugs.openjdk.org/browse/JDK-8368061) > - Performance comparison with Graal. > > ---------------------------------------------------------------------- > > **Array Benchmark: auto vectorization vs scalar** > > We can see that for arrays, auto vectorization leads to minor regressions for sizes 1-32, and then generally auto vectorization is faster for larger sizes. And this is true for both `fill` and `copy`. > > Strange: `macosx_aarch64` with `copy_int`. The auto vectoirized performance has a sudden drop around 150 iterations. Also for `fill_long` we have a "phase-transition" around 64, that goes steeper rather... This pull request has now been integrated. Changeset: 650de99f Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/650de99fc662a3e8473391627df9e523b6b80727 Stats: 1148 lines in 2 files changed: 1148 ins; 0 del; 0 mod 8367158: C2: create better fill and copy benchmarks, taking alignment into account Reviewed-by: qamai, kvn ------------- PR: https://git.openjdk.org/jdk/pull/27315 From erfang at openjdk.org Fri Dec 12 07:47:12 2025 From: erfang at openjdk.org (Eric Fang) Date: Fri, 12 Dec 2025 07:47:12 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v5] In-Reply-To: References: Message-ID: > `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. > > This PR does the following optimizations: > 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. > 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as th... Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge branch 'master' into JDK-8370863-mask-cast-opt - Merge branch 'master' into JDK-8370863-mask-cast-opt - Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java - Refine the test code and comments - Merge branch 'master' into JDK-8370863-mask-cast-opt - Don't read and write the same memory in the JMH benchmarks - Merge branch 'master' into JDK-8370863-mask-cast-opt - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. Current optimizations related to `VectorMaskCastNode` include: 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. This PR does the following optimizations: 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as the vector length remains the same, and this is guranteed in the api level. I conducted some simple research on different mask generation methods and mask operations, and obtained the following table, which includes some potential optimization opportunities that may use this `uncast_mask` function. ``` mask_gen\op toLong anyTrue allTrue trueCount firstTrue lastTrue compare N/A N/A N/A N/A N/A N/A maskAll TBI TBI TBI TBI TBI TBI fromLong TBI TBI N/A TBI TBI TBI mask_gen\op and or xor andNot not laneIsSet compare N/A N/A N/A N/A TBI N/A maskAll TBI TBI TBI TBI TBI TBI fromLong N/A N/A N/A N/A TBI TBI ``` `TBI` indicated that there may be potential optimizations here that require further investigation. Benchmarks: On a Nvidia Grace machine with 128-bit SVE2: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 59.23 0.21 148.12 0.07 2.50 microMaskLoadCastStoreDouble128 ops/us 2.43 0.00 38.31 0.01 15.73 microMaskLoadCastStoreFloat128 ops/us 6.19 0.00 75.67 0.11 12.22 microMaskLoadCastStoreInt128 ops/us 6.19 0.00 75.67 0.03 12.22 microMaskLoadCastStoreLong128 ops/us 2.43 0.00 38.32 0.01 15.74 microMaskLoadCastStoreShort64 ops/us 28.89 0.02 75.60 0.09 2.62 ``` On a Nvidia Grace machine with 128-bit NEON: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 75.75 0.19 149.74 0.08 1.98 microMaskLoadCastStoreDouble128 ops/us 8.71 0.03 38.71 0.05 4.44 microMaskLoadCastStoreFloat128 ops/us 24.05 0.03 76.49 0.05 3.18 microMaskLoadCastStoreInt128 ops/us 24.06 0.02 76.51 0.05 3.18 microMaskLoadCastStoreLong128 ops/us 8.72 0.01 38.71 0.02 4.44 microMaskLoadCastStoreShort64 ops/us 24.64 0.01 76.43 0.06 3.10 ``` On an AMD EPYC 9124 16-Core Processor with AVX3: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 82.13 0.31 115.14 0.08 1.40 microMaskLoadCastStoreDouble128 ops/us 0.32 0.00 0.32 0.00 1.01 microMaskLoadCastStoreFloat128 ops/us 42.18 0.05 57.56 0.07 1.36 microMaskLoadCastStoreInt128 ops/us 42.19 0.01 57.53 0.08 1.36 microMaskLoadCastStoreLong128 ops/us 0.30 0.01 0.32 0.00 1.05 microMaskLoadCastStoreShort64 ops/us 42.18 0.05 57.59 0.01 1.37 ``` On an AMD EPYC 9124 16-Core Processor with AVX2: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 73.53 0.20 114.98 0.03 1.56 microMaskLoadCastStoreDouble128 ops/us 0.29 0.01 0.30 0.00 1.00 microMaskLoadCastStoreFloat128 ops/us 30.78 0.14 57.50 0.01 1.87 microMaskLoadCastStoreInt128 ops/us 30.65 0.26 57.50 0.01 1.88 microMaskLoadCastStoreLong128 ops/us 0.30 0.00 0.30 0.00 0.99 microMaskLoadCastStoreShort64 ops/us 24.92 0.00 57.49 0.01 2.31 ``` On an AMD EPYC 9124 16-Core Processor with AVX1: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 79.68 0.01 248.49 0.91 3.12 microMaskLoadCastStoreDouble128 ops/us 0.28 0.00 0.28 0.00 1.00 microMaskLoadCastStoreFloat128 ops/us 31.11 0.04 95.48 2.27 3.07 microMaskLoadCastStoreInt128 ops/us 31.10 0.03 99.94 1.87 3.21 microMaskLoadCastStoreLong128 ops/us 0.28 0.00 0.28 0.00 0.99 microMaskLoadCastStoreShort64 ops/us 31.11 0.02 94.97 2.30 3.05 ``` This PR was tested on 128-bit, 256-bit, and 512-bit (QEMU) aarch64 environments, and two 512-bit x64 machines with various configurations, including sve2, sve1, neon, avx3, avx2, avx1, sse4 and sse3, all tests passed. ------------- Changes: https://git.openjdk.org/jdk/pull/28313/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=04 Stats: 631 lines in 7 files changed: 520 ins; 16 del; 95 mod Patch: https://git.openjdk.org/jdk/pull/28313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28313/head:pull/28313 PR: https://git.openjdk.org/jdk/pull/28313 From mhaessig at openjdk.org Fri Dec 12 08:16:58 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 12 Dec 2025 08:16:58 GMT Subject: RFR: 8368977: Provide clear naming for AVX10 identifiers [v2] In-Reply-To: References: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> Message-ID: On Mon, 8 Dec 2025 21:47:16 GMT, Mohamed Issa wrote: >> This is a simple change that renames all AVX10 identifiers to explicitly show which sub-versions are in use. Right now, AVX10.2 is the only case to worry about. The JTREG tests listed below were used to verify correctness with the recommended JVM options mentioned in corresponding source files. Each test included runs through emulation with AVX10.2 enabled and disabled to exercise all possible paths. All modifications and tests used [OpenJDK v26-b24](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B24) as the baseline build. >> >> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` >> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` >> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` >> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` >> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` >> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` >> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` >> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` >> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java` >> 10. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java` >> 11. `jtreg:test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java` >> 12. `jtreg:test/jdk/jdk/incubator/vector/Double64VectorTests.java` >> 13. `jtreg:test/jdk/jdk/incubator/vector/Double128VectorTests.java` >> 14. `jtreg:test/jdk/jdk/incubator/vector/Double256VectorTests.java` >> 15. `jtreg:test/jdk/jdk/incubator/vector/Double512VectorTests.java` >> 16. `jtreg:test/jdk/jdk/incubator/vector/DoubleMaxVectorTests.java` >> 17. `jtreg:test/jdk/jdk/incubator/vector/Float64VectorTests.java` >> 18. `jtreg:test/jdk/jdk/incubator/vector/Float128VectorTests.java` >> 19. `jtreg:test/jdk/jdk/incubator/vector/Float256VectorTests.java` >> 20. `jtreg:test/jdk/jdk/incubator/vector/Float512VectorTests.java` >> 21. `jtreg:test/jdk/jdk/incubator/vector/FloatMaxVectorTests.java` >> 22. `jtreg:test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` >> 23. `jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java` > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Remove changes that affect functionality Thank you for this clarification, @missa-prime. This is very helpful. I'll run testing on our side and report results. ------------- PR Review: https://git.openjdk.org/jdk/pull/28344#pullrequestreview-3570787062 From hgreule at openjdk.org Fri Dec 12 08:23:33 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 12 Dec 2025 08:23:33 GMT Subject: RFR: 8373555: C2: Optimize redundant input calculations for sign comparisons Message-ID: Instead of sign-comparisons with And,Or,Xor,Max,Min nodes, we can directly compare to one of the inputs of the binary nodes if the other input is irrelevant to the comparison. There are potentially more operations, but these mentioned here are the most obvious ones. Max and Min could theoretically be expanded to arbitrary comparisons to constants, but I didn't want to introduce more complexity for now. Please let me know what you think :) ------------- Commit messages: - more randomized tests - simplify sign invariant comparisons - test Changes: https://git.openjdk.org/jdk/pull/28782/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28782&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373555 Stats: 840 lines in 3 files changed: 840 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28782.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28782/head:pull/28782 PR: https://git.openjdk.org/jdk/pull/28782 From dlong at openjdk.org Fri Dec 12 09:31:57 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 12 Dec 2025 09:31:57 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v6] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 21:27:22 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: > > - Improve the test > - Improve comments Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28517#pullrequestreview-3571042335 From mli at openjdk.org Fri Dec 12 10:02:21 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 12 Dec 2025 10:02:21 GMT Subject: Integrated: 8371920: [TEST] Enable CMove tests on other platforms In-Reply-To: References: Message-ID: <1y6nbkGafhECmopqK22qeWRCVgT-LrYT1zxOy7ca2JA=.103d9c0a-48ed-4027-b0db-a41c25a5f0e0@github.com> On Mon, 8 Dec 2025 17:55:18 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > [JDK-8357551](https://bugs.openjdk.org/browse/JDK-8357551) add support of CMoveF/D vectorization, at the same time it also adds some tests for scalar CMove on riscv. > It's good to enable these tests on other platforms, like x86/aarch64 or maybe others. > > At the same time, this pr also move these tests under `compiler/c2/cmove`, as suggested here https://github.com/openjdk/jdk/pull/28309#discussion_r2598664764. > > Thanks! > > ## Test > In progress... (I'm using github CI to run the tests.) This pull request has now been integrated. Changeset: dc625526 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/dc6255261f34c65d0e87814638817c97a880eb7f Stats: 233 lines in 3 files changed: 72 ins; 2 del; 159 mod 8371920: [TEST] Enable CMove tests on other platforms Reviewed-by: fyang, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28702 From chagedorn at openjdk.org Fri Dec 12 10:03:19 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 12 Dec 2025 10:03:19 GMT Subject: RFR: 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode Message-ID: This is a simple clean-up patch which moves `ProjNode::other_if_proj()` to `IfProjNode` and update its uses. It only makes sense to call `other_if_proj()` on actual `IfProjNodes`. It also required to update more types from `ProjNode` to `IfProjNode` which is more type-safe and preciser. While touching the methods, I've also added some `const`/`static` where appropriate. Thanks, Christian ------------- Commit messages: - 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode Changes: https://git.openjdk.org/jdk/pull/28785/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28785&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373513 Stats: 56 lines in 8 files changed: 6 ins; 8 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/28785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28785/head:pull/28785 PR: https://git.openjdk.org/jdk/pull/28785 From chagedorn at openjdk.org Fri Dec 12 10:03:23 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 12 Dec 2025 10:03:23 GMT Subject: RFR: 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 09:48:28 GMT, Christian Hagedorn wrote: > This is a simple clean-up patch which moves `ProjNode::other_if_proj()` to `IfProjNode` and update its uses. It only makes sense to call `other_if_proj()` on actual `IfProjNodes`. > > It also required to update more types from `ProjNode` to `IfProjNode` which is more type-safe and preciser. While touching the methods, I've also added some `const`/`static` where appropriate. > > Thanks, > Christian src/hotspot/share/opto/ifnode.cpp line 794: > 792: if (otherproj->outcnt() == 1 && region != nullptr && !region->has_phi()) { > 793: for (int i = 0; i < 2; i++) { > 794: IfProjNode* next_proj = proj_out(i)->as_IfProj(); Renamed to not shadow the `proj` parameter. We could also think about whether we want to have a new method `if_proj_out(i)` in class `IfNode` at some point. src/hotspot/share/opto/ifnode.cpp line 863: > 861: > 862: CallStaticJavaNode* unc = nullptr; > 863: IfProjNode* unc_proj = uncommon_trap_proj(unc)->as_IfProj(); I think we can also make `uncommon_trap_proj` return an `IfProjNode`. But I want to tackle that separately. src/hotspot/share/opto/multnode.cpp line 263: > 261: return nullptr; > 262: } > 263: return as_IfProj()->other_if_proj()->is_uncommon_trap_proj(reason); `is_uncommon_trap_if_pattern()` could also be moved to `IfProjNode`. I also want to do that separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28785#discussion_r2613602266 PR Review Comment: https://git.openjdk.org/jdk/pull/28785#discussion_r2613604562 PR Review Comment: https://git.openjdk.org/jdk/pull/28785#discussion_r2613610175 From mli at openjdk.org Fri Dec 12 10:03:20 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 12 Dec 2025 10:03:20 GMT Subject: RFR: 8373428: Refine variables with the same name in nested scopes in PhaseChaitin::gather_lrg_masks In-Reply-To: References: <8WWg7y_W2PGKAkwrVUfN97dBZ56I2MRvbMuxowqmnZE=.4c238198-0b07-47da-8756-1485846f044f@github.com> Message-ID: On Thu, 11 Dec 2025 18:19:40 GMT, Paul Hohensee wrote: >> Hi, >> Can you help to review this trivial patch? >> >> In PhaseChaitin::gather_lrg_masks, several variables have the same name in nested scopes, it looks like following code snippet. >> { >> A a; >> { >> A a; >> } >> } >> >> This is not helpful to code readability, in particular in a long method like `gather_lrg_masks`, better to rename them. >> >> Thanks! > > Marked as reviewed by phh (Reviewer). @phohensee Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28748#issuecomment-3645777468 From chagedorn at openjdk.org Fri Dec 12 10:48:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 12 Dec 2025 10:48:06 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v5] In-Reply-To: <-_h21X5PWkjy5p_jC8nHr3sxeApZlHPEg3DuMUF89QI=.8c71347e-699a-4f56-a988-a36b68b6fe49@github.com> References: <8uE-UIoLllpjPuICc7sjKwo2eEtbGPYcgFwDUtQ0QpM=.525a688d-2dfc-4dfb-9dd5-c8024d4bb74e@github.com> <-_h21X5PWkjy5p_jC8nHr3sxeApZlHPEg3DuMUF89QI=.8c71347e-699a-4f56-a988-a36b68b6fe49@github.com> Message-ID: On Wed, 10 Dec 2025 22:20:11 GMT, Vladimir Ivanov wrote: >> test/hotspot/jtreg/compiler/inlining/TestSubtypeCheckTypeInfo.java line 363: >> >>> 361: // Sample: >>> 362: // 213 42 b compiler.inlining.TestSubtypeCheckTypeInfo::testIsInstanceCondLatePost (13 bytes) >>> 363: static final Pattern TEST_CASE = Pattern.compile("^\\d+\\s+\\d+\\s+b\\s+" + TEST_CLASS_NAME + "::(\\w+) .*"); >> >> Drive by comment, no need to change things here now: >> @iwanowww @chhagedorn Would it not be nice if we could do this kind of matching with the `TestFramework`? Instead of `IR` matching, just match the output of any compilation tracing / printing. > > Indeed, that would be a much better way. > > Also, `-XX:+LogCompilation` is a nice option since publishes all information in a structured way, but it would introduce a dependency on LogCompilation tool in the test library. That's an interesting idea to think about more. But I it would be a separate concept next to `@IR` even though some code could probably be shared. > but it would introduce a dependency on LogCompilation tool in the test library. The IR framework already uses `LogCompilation` to parse the IR dumps from. But I think that's quite an overhead - a lot of the information is not needed and makes the parsing logic more complicated. I've been thinking about introducing a separate IR framework file for all relevant dumps, similar to dumping IGV dumps to a separate file. This will simplify things and only dumps relevant information. It will also have a positive impact on performance. We could even go a step further and send the dumps over a port to the IR framework (similar to dumping the graph directly to the opened IGV over the network). Today, we already have sockets in place to send messages from the test VM to the driver VM. I guess we could extend that to also allow HotSpot to send dumps to the driver VM (it sounds feasible to do but would need to experiment with it). Sending dumps from HotSpot to the IR framework has another benefit that HotSpot can provide all the information needed for the IR framework to figure out where this dump belongs to (i.e. no additional parsing of a file needed). This might also allow us to do IR matching in parallel to the test VM execution. This is much more efficient because today, the driver VM just waits until the test VM is finished and only then starts to do IR matching. Anyway, that was just a digression about future ideas for the IR framework. What you have now to parse the output is the best we can do today. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2613767737 From qamai at openjdk.org Fri Dec 12 11:16:03 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Dec 2025 11:16:03 GMT Subject: RFR: 8373577: C2: Cleanup adr_type of CallLeafPureNode Message-ID: Hi, This PR is extracted from #28570 , `CallLeafPureNode`s do not read from or write to memory, so their `adr_type` should be `nullptr`. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - Fix memory effect of CallLeafPureNode Changes: https://git.openjdk.org/jdk/pull/28786/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28786&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373577 Stats: 8 lines in 4 files changed: 1 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28786.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28786/head:pull/28786 PR: https://git.openjdk.org/jdk/pull/28786 From bmaillard at openjdk.org Fri Dec 12 12:15:21 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 12 Dec 2025 12:15:21 GMT Subject: RFR: 8373579: Problem list compiler/runtime/Test7196199.java Message-ID: Problem list test/hotspot/jtreg/ProblemList.txt until [JDK-8365196](https://bugs.openjdk.org/browse/JDK-8365196) is fixed. Thank you for reviewing! ------------- Commit messages: - Merge branch 'master' into JDK-8373579 - 8373579: problemlist JDK-8365196 on windows-x64 Changes: https://git.openjdk.org/jdk/pull/28787/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28787&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373579 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28787.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28787/head:pull/28787 PR: https://git.openjdk.org/jdk/pull/28787 From chagedorn at openjdk.org Fri Dec 12 12:15:21 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 12 Dec 2025 12:15:21 GMT Subject: RFR: 8373579: Problem list compiler/runtime/Test7196199.java In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 11:41:07 GMT, Beno?t Maillard wrote: > Problem list test/hotspot/jtreg/ProblemList.txt until [JDK-8365196](https://bugs.openjdk.org/browse/JDK-8365196) is fixed. > > Thank you for reviewing! Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28787#pullrequestreview-3571577231 From epeter at openjdk.org Fri Dec 12 12:45:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 12:45:05 GMT Subject: RFR: 8373579: Problem list compiler/runtime/Test7196199.java In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 11:41:07 GMT, Beno?t Maillard wrote: > Problem list test/hotspot/jtreg/ProblemList.txt until [JDK-8365196](https://bugs.openjdk.org/browse/JDK-8365196) is fixed. > > Thank you for reviewing! Looks good. Though in the PR description you mentioned ProblemList.txt, when you probably meant to mention the test, right? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28787#pullrequestreview-3571674002 From mhaessig at openjdk.org Fri Dec 12 12:47:16 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 12 Dec 2025 12:47:16 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v6] In-Reply-To: References: Message-ID: <0jWlBghKGEMjDb4DrpXH_Uhy7EMjdaqenRMoD-Bfu4Y=.78c9a448-8a3d-481f-95d4-aecba4a9e700@github.com> On Thu, 11 Dec 2025 15:42:42 GMT, Roland Westrelin wrote: >> For this failure memory stats are: >> >> >> Total Usage: 1095525816 >> --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- >> Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other >> none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 >> parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 >> optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 >> connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 >> iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 >> idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 >> macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 >> postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 >> scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 >> regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 >> ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > package declaration The new changes look good to me. I'll kick off some more testing. ------------- PR Review: https://git.openjdk.org/jdk/pull/28581#pullrequestreview-3529839561 From mhaessig at openjdk.org Fri Dec 12 12:47:20 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 12 Dec 2025 12:47:20 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 11:21:05 GMT, Roland Westrelin wrote: >> For this failure memory stats are: >> >> >> Total Usage: 1095525816 >> --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- >> Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other >> none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 >> parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 >> optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 >> connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 >> iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 >> idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 >> macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 >> postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 >> scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 >> regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 >> ... > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - review > - review src/hotspot/share/opto/loopnode.hpp line 672: > 670: bool _allow_optimizations; // Allow loop optimizations > 671: > 672: IdealLoopTree( PhaseIdealLoop* phase, Node *head, Node *tail ); Suggestion: IdealLoopTree( PhaseIdealLoop* phase, Node* head, Node* tail ); Since you are changing this line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2580968318 From qamai at openjdk.org Fri Dec 12 12:51:40 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Dec 2025 12:51:40 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes Message-ID: Hi, This is extracted from #28570 , there are 2 issues here: - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. Please kindly review, thanks a lot. ------------- Commit messages: - Fix memory around intrinsics nodes Changes: https://git.openjdk.org/jdk/pull/28789/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28789&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373591 Stats: 274 lines in 6 files changed: 177 ins; 5 del; 92 mod Patch: https://git.openjdk.org/jdk/pull/28789.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28789/head:pull/28789 PR: https://git.openjdk.org/jdk/pull/28789 From qamai at openjdk.org Fri Dec 12 12:51:41 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Dec 2025 12:51:41 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 12:43:14 GMT, Quan Anh Mai wrote: > Hi, > > This is extracted from #28570 , there are 2 issues here: > > - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. > - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. > > Please kindly review, thanks a lot. @eme64 I have extracted the fix of memory around intrinsics nodes in the other PR to this PR and added a unit test for the potential issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28789#issuecomment-3646349261 From mhaessig at openjdk.org Fri Dec 12 12:52:03 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 12 Dec 2025 12:52:03 GMT Subject: RFR: 8368977: Provide clear naming for AVX10 identifiers [v2] In-Reply-To: References: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> Message-ID: On Mon, 8 Dec 2025 21:47:16 GMT, Mohamed Issa wrote: >> This is a simple change that renames all AVX10 identifiers to explicitly show which sub-versions are in use. Right now, AVX10.2 is the only case to worry about. The JTREG tests listed below were used to verify correctness with the recommended JVM options mentioned in corresponding source files. Each test included runs through emulation with AVX10.2 enabled and disabled to exercise all possible paths. All modifications and tests used [OpenJDK v26-b24](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B24) as the baseline build. >> >> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` >> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` >> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` >> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` >> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` >> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` >> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` >> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` >> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java` >> 10. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java` >> 11. `jtreg:test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java` >> 12. `jtreg:test/jdk/jdk/incubator/vector/Double64VectorTests.java` >> 13. `jtreg:test/jdk/jdk/incubator/vector/Double128VectorTests.java` >> 14. `jtreg:test/jdk/jdk/incubator/vector/Double256VectorTests.java` >> 15. `jtreg:test/jdk/jdk/incubator/vector/Double512VectorTests.java` >> 16. `jtreg:test/jdk/jdk/incubator/vector/DoubleMaxVectorTests.java` >> 17. `jtreg:test/jdk/jdk/incubator/vector/Float64VectorTests.java` >> 18. `jtreg:test/jdk/jdk/incubator/vector/Float128VectorTests.java` >> 19. `jtreg:test/jdk/jdk/incubator/vector/Float256VectorTests.java` >> 20. `jtreg:test/jdk/jdk/incubator/vector/Float512VectorTests.java` >> 21. `jtreg:test/jdk/jdk/incubator/vector/FloatMaxVectorTests.java` >> 22. `jtreg:test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` >> 23. `jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java` > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Remove changes that affect functionality Testing tier1-3 passed ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28344#pullrequestreview-3571695712 From rcastanedalo at openjdk.org Fri Dec 12 13:06:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 Dec 2025 13:06:54 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:46:40 GMT, Roland Westrelin wrote: >> The crash occurs because verification code expects the inner and outer >> loop of a loop strip mining nest to have the same number of phis but, >> in this case, the inner loop has one more memory phis than the outer >> loop. >> >> 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and >> outer loops have the same number of phis, as expected. >> >> >> 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] >> 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> >> 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 >> through the outer loop phi: >> >> >> 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] >> 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx... > > Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/opto/node.cpp > > Co-authored-by: Daniel Lund?n > - Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java > > Co-authored-by: Daniel Lund?n > - Update src/hotspot/share/opto/cfgnode.cpp > > Co-authored-by: Daniel Lund?n Looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28677#pullrequestreview-3571740420 From epeter at openjdk.org Fri Dec 12 13:15:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 13:15:00 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: References: Message-ID: <8ZNxxtdCUYYVmqq8Lam8C9sI7xh9qqHU_5j16ef6S0Q=.d793a57b-bc4c-4ed5-acba-ebf1bd372d4b@github.com> On Thu, 11 Dec 2025 15:46:40 GMT, Roland Westrelin wrote: >> The crash occurs because verification code expects the inner and outer >> loop of a loop strip mining nest to have the same number of phis but, >> in this case, the inner loop has one more memory phis than the outer >> loop. >> >> 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and >> outer loops have the same number of phis, as expected. >> >> >> 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] >> 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> >> 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 >> through the outer loop phi: >> >> >> 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] >> 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx... > > Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/opto/node.cpp > > Co-authored-by: Daniel Lund?n > - Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java > > Co-authored-by: Daniel Lund?n > - Update src/hotspot/share/opto/cfgnode.cpp > > Co-authored-by: Daniel Lund?n src/hotspot/share/opto/cfgnode.cpp line 2701: > 2699: } > 2700: } > 2701: } Another drive-by question: You are refactoring / fixing existing optimizations: Are there IR tests that cover the original optimization? How do we avoid that we lose optimizations here? test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java line 34: > 32: * -XX:CompileCommand=compileonly,*TestMismatchedMemoryPhis*::mainTest -XX:-TieredCompilation > 33: * -Xcomp -XX:+StressIGVN -XX:+StressLoopPeeling -XX:PerMethodTrapLimit=0 TestMismatchedMemoryPhis > 34: * @run main TestMismatchedMemoryPhis Suggestion: * @run main ${test.main.class} Optional nits, drive-by stile ? Possible since a new JTREG version. Also: you test does not have a package. test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java line 62: > 60: } catch (NullPointerException npe) { > 61: // Expected > 62: } Could this exception be avoided, and still reproduce the bug? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2614168898 PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2614160478 PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2614162891 From epeter at openjdk.org Fri Dec 12 13:22:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 13:22:55 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations In-Reply-To: <9pBouCjCdkv0Ba0emphWr57W8OKxJaUDHh0eaRdT894=.831501cc-7b5c-42ac-ae05-fe221180c3bb@github.com> References: <9pBouCjCdkv0Ba0emphWr57W8OKxJaUDHh0eaRdT894=.831501cc-7b5c-42ac-ae05-fe221180c3bb@github.com> Message-ID: On Fri, 12 Dec 2025 06:54:18 GMT, Tobias Hartmann wrote: >> `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. >> >> The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. >> >> Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. >> >> If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?): >> >> >> // MinINode::Ideal >> // Did not investigate, but there are some patterns that might >> // need more notification. >> case Op_MinI: >> case Op_MaxI: // preemptively removed it as well. >> return false; >> >> >> I've run tier1-3 tests on linux/x64 and they passed. > > src/hotspot/share/opto/addnode.cpp line 1459: > >> 1457: // Ideal transformations for MaxINode >> 1458: Node* MaxINode::Ideal(PhaseGVN* phase, bool can_reshape) { >> 1459: Node* n = AddNode::Ideal(phase, can_reshape); > > Why not move this into `MaxNode::IdealI`? Yes, the call below `return IdealI(phase, can_reshape);` already looks like it wants to handle all the superclass optimiazations. So it should go in there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28770#discussion_r2613871712 From epeter at openjdk.org Fri Dec 12 13:22:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 13:22:53 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 16:02:24 GMT, Galder Zamarre?o wrote: > `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. > > The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. > > Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. > > If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?): > > > // MinINode::Ideal > // Did not investigate, but there are some patterns that might > // need more notification. > case Op_MinI: > case Op_MaxI: // preemptively removed it as well. > return false; > > > I've run tier1-3 tests on linux/x64 and they passed. @galderz Nice, this looks like a good use-case of the template framework, it reduces the test size! It still feels a bit boiler-plate-y ... but it is a step in the right direction for sure ? test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdeal.java line 30: > 28: * @modules java.base/jdk.internal.misc > 29: * @library /test/lib / > 30: * @run driver compiler.c2.irTests.TestMinMaxIdeal Suggestion: * @run driver ${test.main.class} Also: please don't put any new tests in `irTests`. Rather put it in a directory based on the topic. test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdeal.java line 56: > 54: String templatedPackage ="compiler.c2.templated"; > 55: String templatedClassName ="MinMaxIdeal"; > 56: String templatedFQN = "%s.%s".formatted(templatedPackage, templatedClassName); That looks a bit convoluted. Why not just use the final string? test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdeal.java line 74: > 72: testTemplateTokens.add(new TestGenerator(Op.MAX_I).generate()); > 73: testTemplateTokens.add(new TestGenerator(Op.MIN_L).generate()); > 74: testTemplateTokens.add(new TestGenerator(Op.MAX_L).generate()); Why not use `Op.values()` -> `List`, then iterate over that? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28770#pullrequestreview-3571434582 PR Review Comment: https://git.openjdk.org/jdk/pull/28770#discussion_r2613876244 PR Review Comment: https://git.openjdk.org/jdk/pull/28770#discussion_r2613880453 PR Review Comment: https://git.openjdk.org/jdk/pull/28770#discussion_r2614178984 From bmaillard at openjdk.org Fri Dec 12 13:42:52 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 12 Dec 2025 13:42:52 GMT Subject: RFR: 8373579: Problem list compiler/runtime/Test7196199.java In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 12:41:56 GMT, Emanuel Peter wrote: > Looks good. Though in the PR description you mentioned ProblemList.txt, when you probably meant to mention the test, right? My bad, yes of course. Thanks @eme64 @chhagedorn! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28787#issuecomment-3646530091 From bmaillard at openjdk.org Fri Dec 12 13:48:05 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 12 Dec 2025 13:48:05 GMT Subject: Integrated: 8373579: Problem list compiler/runtime/Test7196199.java In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 11:41:07 GMT, Beno?t Maillard wrote: > Problem list compiler/runtime/Test7196199.java until [JDK-8365196](https://bugs.openjdk.org/browse/JDK-8365196) is fixed. > > Thank you for reviewing! This pull request has now been integrated. Changeset: a05d5d25 Author: Beno?t Maillard URL: https://git.openjdk.org/jdk/commit/a05d5d2514c835f2bfeaf7a8c7df0ac241f0177f Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8373579: Problem list compiler/runtime/Test7196199.java Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28787 From roland at openjdk.org Fri Dec 12 13:50:02 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Dec 2025 13:50:02 GMT Subject: RFR: 8373420: C2: Add true/false_proj*() methods for IfNode as a replacement for proj_out*(true/false) In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 13:13:44 GMT, Christian Hagedorn wrote: > There are a lot of places in the code where we call `proj_out*(true/false)` on an `IfNode`. In some cases, we then cast the returned `ProjNode` back to `IfProjNode` or `IfTrueNode/IfFalseNode`. I often visit such code and now decided to clean this up. > > The patch proposes new `IfNode::true/false_proj*()` methods that return `IfTrueNode/IfFalseNode` directly. I walked through all `proj_out*()` calls and replaced those that used a direct `true/false` or `1/0` as argument. > > There are still more things to clean up in this area, for example, when we return `ProjNode` even though it should be an `IfProjNode` which requires more casting. But let's do that step by step in follow-up clean ups. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28745#pullrequestreview-3571887824 From roland at openjdk.org Fri Dec 12 13:57:24 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Dec 2025 13:57:24 GMT Subject: RFR: 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 09:48:28 GMT, Christian Hagedorn wrote: > This is a simple clean-up patch which moves `ProjNode::other_if_proj()` to `IfProjNode` and update its uses. It only makes sense to call `other_if_proj()` on actual `IfProjNodes`. > > It also required to update more types from `ProjNode` to `IfProjNode` which is more type-safe and preciser. While touching the methods, I've also added some `const`/`static` where appropriate. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28785#pullrequestreview-3571917991 From roland at openjdk.org Fri Dec 12 13:58:23 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Dec 2025 13:58:23 GMT Subject: RFR: 8373577: C2: Cleanup adr_type of CallLeafPureNode In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 11:09:36 GMT, Quan Anh Mai wrote: > Hi, > > This PR is extracted from #28570 , `CallLeafPureNode`s do not read from or write to memory, so their `adr_type` should be `nullptr`. > > Please take a look and leave your reviews, thanks a lot. Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28786#pullrequestreview-3571921143 From epeter at openjdk.org Fri Dec 12 14:13:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 14:13:55 GMT Subject: RFR: 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph Message-ID: Thanks for @chhagedorn and @rwestrel for triaging / doing some first investigation. This is a regression of [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751) / https://github.com/openjdk/jdk/pull/24278. This is almost the same as https://github.com/openjdk/jdk/pull/28449, so have a quick look at it. It was also an issue with some nodes being pinned too low, and not available at the speculative check. There, it was the `pre_init` values of the `iv`. Now it is the variables of the `VPointer`. The fix is pretty similar as well. ------------------------------------------ **Analysis** The reproducer gets a `bad graph` assert because of this cycle: image Note: `921 CountedLoop` is the pre-loop, the main-loop is further down from it. And `607 ParsePredicate` is the `#Auto_Vectorization_Check`, and `1403` is the aliasing check inserted for the VPointer named below. This is the relevant VPointer: `VPointer[size: 4, object, base(920 CastPP) + con( 20) + iv_scale( 0) * iv + invar(0)]` The base `920 CastPP` is the problematic variable. In `VPointer::init_are_non_iv_summands_pre_loop_invariant`, we check that: `_vloop.is_pre_loop_invariant(variable)` And that holds for `920 CastPP`. So far so good. This used to be enough when we only adjusted the pre-loop limit for alignment. But now that we need the variables for the aliasing runtime check further up, this is not sufficient any more. Analogue to https://github.com/openjdk/jdk/pull/28449, we would now need: `this->_vloop.is_available_for_speculative_check(variable)` And that is false for `920 CastPP`, since it is pinned after the speculative check. **Solution** We should not insert the aliasing runtime check, and hence we probably cannot vectorize this case. For now, this makes all tests pass. I think just like with https://github.com/openjdk/jdk/pull/28449 these cases are edge cases we don't have to worry too much about. But if they ever do become important, we could try to uncast the variables. But I don't know if that is without issues, we would certainly lose some info that we get from the casts. ------------- Commit messages: - fix up detail - JDK-8373502 Changes: https://git.openjdk.org/jdk/pull/28783/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28783&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373502 Stats: 113 lines in 3 files changed: 113 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28783/head:pull/28783 PR: https://git.openjdk.org/jdk/pull/28783 From roland at openjdk.org Fri Dec 12 14:15:15 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Dec 2025 14:15:15 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 12:43:14 GMT, Quan Anh Mai wrote: > Hi, > > This is extracted from #28570 , there are 2 issues here: > > - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. > - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. > > Please kindly review, thanks a lot. src/hotspot/share/opto/graphKit.cpp line 4191: > 4189: Node* res_mem = _gvn.transform(new SCMemProjNode(_gvn.transform(str))); > 4190: if (adr_type == TypePtr::BOTTOM) { > 4191: set_all_memory(res_mem); I'm confused by this. Doesn't `StrCompressedCopyNode` only write to dst? So the only part of the memory state that it updates is the one for `TypeAryPtr::BYTES`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28789#discussion_r2614347483 From roland at openjdk.org Fri Dec 12 14:30:13 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Dec 2025 14:30:13 GMT Subject: RFR: 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 08:46:28 GMT, Emanuel Peter wrote: > Thanks for @chhagedorn and @rwestrel for triaging / doing some first investigation. > > This is a regression of [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751) / https://github.com/openjdk/jdk/pull/24278. > > This is almost the same as https://github.com/openjdk/jdk/pull/28449, so have a quick look at it. > It was also an issue with some nodes being pinned too low, and not available at the speculative check. > There, it was the `pre_init` values of the `iv`. Now it is the variables of the `VPointer`. > The fix is pretty similar as well. > > ------------------------------------------ > > **Analysis** > > The reproducer gets a `bad graph` assert because of this cycle: > image > Note: `921 CountedLoop` is the pre-loop, the main-loop is further down from it. > And `607 ParsePredicate` is the `#Auto_Vectorization_Check`, and `1403` is the aliasing check inserted for the VPointer named below. > > This is the relevant VPointer: > `VPointer[size: 4, object, base(920 CastPP) + con( 20) + iv_scale( 0) * iv + invar(0)]` > The base `920 CastPP` is the problematic variable. > > In `VPointer::init_are_non_iv_summands_pre_loop_invariant`, we check that: > `_vloop.is_pre_loop_invariant(variable)` > And that holds for `920 CastPP`. So far so good. > > This used to be enough when we only adjusted the pre-loop limit for alignment. > But now that we need the variables for the aliasing runtime check further up, this is not sufficient any more. Analogue to https://github.com/openjdk/jdk/pull/28449, we would now need: > `this->_vloop.is_available_for_speculative_check(variable)` > And that is false for `920 CastPP`, since it is pinned after the speculative check. > > **Solution** > We should not insert the aliasing runtime check, and hence we probably cannot vectorize this case. > > For now, this makes all tests pass. I think just like with https://github.com/openjdk/jdk/pull/28449 these cases are edge cases we don't have to worry too much about. But if they ever do become important, we could try to uncast the variables. But I don't know if that is without issues, we would certainly lose some info that we get from the casts. Looks reasonable to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28783#pullrequestreview-3572037439 From bkilambi at openjdk.org Fri Dec 12 14:31:20 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 12 Dec 2025 14:31:20 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <_QEYCQm138PWv2vGjMFvEJ6kfMjGEn_vsuEZ_EPaRxQ=.b42967e5-cc22-4c98-a454-6698ce0a70cf@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <_QEYCQm138PWv2vGjMFvEJ6kfMjGEn_vsuEZ_EPaRxQ=.b42967e5-cc22-4c98-a454-6698ce0a70cf@github.com> Message-ID: On Thu, 11 Dec 2025 12:06:49 GMT, Marc Chevalier wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > As for the IR verification failure, I've looked a bit and couldn't find such an issue already. Since it reproduces on master, I suggest you file a ticket, indeed. Thanks! @marc-chevalier @eme64 I have created a ticket for this issue (I could reproduce the IR rules failures on both aarch64 and x86_64 locally) - https://bugs.openjdk.org/browse/JDK-8373605 ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3646698569 From roland at openjdk.org Fri Dec 12 14:37:39 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Dec 2025 14:37:39 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v5] In-Reply-To: References: Message-ID: > The crash occurs because verification code expects the inner and outer > loop of a loop strip mining nest to have the same number of phis but, > in this case, the inner loop has one more memory phis than the outer > loop. > > 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and > outer loops have the same number of phis, as expected. > > > 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] > 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > > 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 > through the outer loop phi: > > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTe... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28677/files - new: https://git.openjdk.org/jdk/pull/28677/files/24a30b44..2476b6a5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28677&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28677&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28677/head:pull/28677 PR: https://git.openjdk.org/jdk/pull/28677 From qamai at openjdk.org Fri Dec 12 14:50:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Dec 2025 14:50:14 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 14:12:40 GMT, Roland Westrelin wrote: >> Hi, >> >> This is extracted from #28570 , there are 2 issues here: >> >> - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. >> - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. >> >> Please kindly review, thanks a lot. > > src/hotspot/share/opto/graphKit.cpp line 4191: > >> 4189: Node* res_mem = _gvn.transform(new SCMemProjNode(_gvn.transform(str))); >> 4190: if (adr_type == TypePtr::BOTTOM) { >> 4191: set_all_memory(res_mem); > > I'm confused by this. Doesn't `StrCompressedCopyNode` only write to dst? So the only part of the memory state that it updates is the one for `TypeAryPtr::BYTES`? It is because if a node consumes more memory than it produces, we need to compute its anti-dependencies. And since we do not compute anti-dependencies of these nodes, it is safer to make them kill all the memory they consume. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28789#discussion_r2614460937 From roland at openjdk.org Fri Dec 12 14:50:10 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Dec 2025 14:50:10 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v6] In-Reply-To: References: Message-ID: > The crash occurs because verification code expects the inner and outer > loop of a loop strip mining nest to have the same number of phis but, > in this case, the inner loop has one more memory phis than the outer > loop. > > 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and > outer loops have the same number of phis, as expected. > > > 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] > 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > > 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 > through the outer loop phi: > > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTe... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28677/files - new: https://git.openjdk.org/jdk/pull/28677/files/2476b6a5..386b63d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28677&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28677&range=04-05 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28677/head:pull/28677 PR: https://git.openjdk.org/jdk/pull/28677 From roland at openjdk.org Fri Dec 12 14:50:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Dec 2025 14:50:14 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: <8ZNxxtdCUYYVmqq8Lam8C9sI7xh9qqHU_5j16ef6S0Q=.d793a57b-bc4c-4ed5-acba-ebf1bd372d4b@github.com> References: <8ZNxxtdCUYYVmqq8Lam8C9sI7xh9qqHU_5j16ef6S0Q=.d793a57b-bc4c-4ed5-acba-ebf1bd372d4b@github.com> Message-ID: On Fri, 12 Dec 2025 13:09:46 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update src/hotspot/share/opto/node.cpp >> >> Co-authored-by: Daniel Lund?n >> - Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java >> >> Co-authored-by: Daniel Lund?n >> - Update src/hotspot/share/opto/cfgnode.cpp >> >> Co-authored-by: Daniel Lund?n > > test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java line 62: > >> 60: } catch (NullPointerException npe) { >> 61: // Expected >> 62: } > > Could this exception be avoided, and still reproduce the bug? Failure doesn't reproduce without it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2614452497 From galder at openjdk.org Fri Dec 12 14:53:15 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 12 Dec 2025 14:53:15 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v5] In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 07:47:12 GMT, Eric Fang wrote: >> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. >> >> This PR does the following optimizations: >> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. >> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect... > > Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java > - Refine the test code and comments > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Don't read and write the same memory in the JMH benchmarks > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns > > `VectorMaskCastNode` is used to cast a vector mask from one type to > another type. The cast may be generated by calling the vector API `cast` > or generated by the compiler. For example, some vector mask operations > like `trueCount` require the input mask to be integer types, so for > floating point type masks, the compiler will cast the mask to the > corresponding integer type mask automatically before doing the mask > operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` > don't generate code, otherwise code will be generated to extend or narrow > the mask. This IR node is not free no matter it generates code or not > because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` > The middle `VectorMaskCast` prevented the following optimization: > `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which > blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we > can safely do the optimization. But if the input value is changed, we > can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper > function, which can be used to uncast a chain of `VectorMaskCastNode`, > like the existing `Node::uncast(bool)` function. The funtion returns > the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may > contain one or more consecutive `VectorMaskCastNode` and this does not > affect the correctness of the optimization. Then this function can be > called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(V... LGTM thanks @erifan! ------------- Marked as reviewed by galder (Author). PR Review: https://git.openjdk.org/jdk/pull/28313#pullrequestreview-3572127237 From epeter at openjdk.org Fri Dec 12 14:59:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 14:59:36 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: References: <8ZNxxtdCUYYVmqq8Lam8C9sI7xh9qqHU_5j16ef6S0Q=.d793a57b-bc4c-4ed5-acba-ebf1bd372d4b@github.com> Message-ID: On Fri, 12 Dec 2025 14:44:48 GMT, Roland Westrelin wrote: >> test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java line 62: >> >>> 60: } catch (NullPointerException npe) { >>> 61: // Expected >>> 62: } >> >> Could this exception be avoided, and still reproduce the bug? > > Failure doesn't reproduce without it. So `iArrFld[]` must be null for this to reproduce? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2614498726 From epeter at openjdk.org Fri Dec 12 15:17:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 15:17:16 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v5] In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 07:47:12 GMT, Eric Fang wrote: >> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. >> >> This PR does the following optimizations: >> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. >> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect... > > Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java > - Refine the test code and comments > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Don't read and write the same memory in the JMH benchmarks > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns > > `VectorMaskCastNode` is used to cast a vector mask from one type to > another type. The cast may be generated by calling the vector API `cast` > or generated by the compiler. For example, some vector mask operations > like `trueCount` require the input mask to be integer types, so for > floating point type masks, the compiler will cast the mask to the > corresponding integer type mask automatically before doing the mask > operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` > don't generate code, otherwise code will be generated to extend or narrow > the mask. This IR node is not free no matter it generates code or not > because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` > The middle `VectorMaskCast` prevented the following optimization: > `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which > blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we > can safely do the optimization. But if the input value is changed, we > can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper > function, which can be used to uncast a chain of `VectorMaskCastNode`, > like the existing `Node::uncast(bool)` function. The funtion returns > the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may > contain one or more consecutive `VectorMaskCastNode` and this does not > affect the correctness of the optimization. Then this function can be > called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(V... Nice work! Thanks for working on this! Just had a quick glance. Maybe I can do a full review next week. src/hotspot/share/opto/vectornode.cpp line 1489: > 1487: Node* VectorStoreMaskNode::Identity(PhaseGVN* phase) { > 1488: // Identity transformation on boolean vectors. > 1489: // VectorStoreMask (VectorMaskCast ... VectorLoadMask bv) elem_size ==> bv Suggestion: // VectorStoreMask (VectorMaskCast* VectorLoadMask bv) elem_size ==> bv Would a regex star be more explicit about 0 or more repetitions? src/hotspot/share/opto/vectornode.cpp line 1492: > 1490: // vector[n]{bool} => vector[n]{t} => vector[n]{bool} > 1491: Node* in1 = VectorNode::uncast_mask(in(1)); > 1492: if (in1->Opcode() == Op_VectorLoadMask && length() == in1->as_Vector()->length()) { Can there be a mismatch with the length? Can you give me an example? ------------- PR Review: https://git.openjdk.org/jdk/pull/28313#pullrequestreview-3572250876 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2614574171 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2614577536 From epeter at openjdk.org Fri Dec 12 15:17:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 15:17:19 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v2] In-Reply-To: References: <4vSKAtr0tUG0V193gIvnEFdHm18ZhqflVAwk-09IVQ0=.081806f5-6303-4b4f-975d-7c85427ccae5@github.com> Message-ID: On Thu, 4 Dec 2025 02:23:40 GMT, Eric Fang wrote: >> src/hotspot/share/opto/vectornode.cpp line 1056: >> >>> 1054: // x remains to be a bool vector with no changes. >>> 1055: // This function can be used to eliminate the VectorMaskCast in such patterns. >>> 1056: Node* VectorNode::uncast_mask(Node* n) { >> >> Could this be a static method instead? > > Yeah it's already a static method. See https://github.com/openjdk/jdk/pull/28313/files#diff-ba9e2d10a50a01316946660ec9f68321eb864fd9c815616c10abbec39360efe5R141 > > Or you mean a static method limited to this file ? If so, I prefer not, it may be used at other places. Thanks~ Could you return a `VectorNode*`? And should the input not already be a `VectorNode*`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2614565511 From galder at openjdk.org Fri Dec 12 15:23:37 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 12 Dec 2025 15:23:37 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 22:23:28 GMT, Emanuel Peter wrote: > We should test `Float16` with Template Framework Tests. For this, I'm now implementing: > > - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. > - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. > - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. Looks good, just a small question about the name of the test test/hotspot/jtreg/testlibrary_tests/verify/tests/TestVerifyIncubatorVector.java line 44: > 42: import compiler.lib.verify.*; > 43: > 44: public class TestVerifyIncubatorVector { I have doubts about leaving the "Incubator" name in the test class name as it's temporary. Are you going to refactor the class name when API is not incubator any more? Maybe `TestVerifyVectorAPI` instead? ------------- Changes requested by galder (Author). PR Review: https://git.openjdk.org/jdk/pull/28095#pullrequestreview-3572147178 PR Review Comment: https://git.openjdk.org/jdk/pull/28095#discussion_r2614488442 From galder at ibm.com Fri Dec 12 15:38:11 2025 From: galder at ibm.com (Galder Zamarreno) Date: Fri, 12 Dec 2025 15:38:11 +0000 Subject: [C2] PEXT/PDEP intrinsics cause performance regression on AMD pre-Zen 3 CPUs In-Reply-To: References: Message-ID: Hi Alessandro, I've created https://bugs.openjdk.org/browse/JDK-8373613 to track this. What about you send a PR with the proposed fix? Our team has AMD servers that help validate the suggested changes. Thanks! Alessandro Autiero writes: > Hi, > > today I stumbled upon a performance issue with the Long.compress/expand and > Integer.compress/expand intrinsics on certain AMD processors. I discovered > this while working on an optimized varint decoder where I was hoping to use > Long.compress() to speed up bit extraction. Instead, I found my "optimized" > version was slower than my naive loop-based implementation. After some > digging, I believe I understand what's happening. > > **Background** > > The compress and expand methods (added in JDK 19 via JDK-8283893 [1]) are > intrinsified by C2 to use the BMI2 PEXT and PDEP instructions when the CPU > reports BMI2 support. > This works great on Intel Haswell+ and AMD Zen 3+, where these instructions > execute in dedicated hardware with approximately 3-cycle latency. > However, AMD processors from Excavator before Zen 3 implement PEXT/PDEP via > microcode emulation rather than native hardware. > This is confirmed by AMD's Software Optimization Guide for Family 19h > Processors [2], Section 2.10.2, which states that Zen 3 has native ALU > support for these instructions. > Wikipedia's page on x86 Bit Manipulation Instruction Sets [3] also > documents this behavior: > >> AMD processors before Zen 3 that implement PDEP and PEXT do so in >> microcode, with a latency of 18 cycles rather than (Zen 3) 3 cycles. As a >> result it is often faster to use other instructions on these processors. > > > **Reproducer** > > Here is a JMH benchmark that demonstrates the issue by comparing the > intrinsified path against the software fallback using ControlIntrinsic > flags: > > ``` > import org.openjdk.jmh.annotations.*; > > import java.util.concurrent.ThreadLocalRandom; > import java.util.concurrent.TimeUnit; > > @BenchmarkMode(Mode.AverageTime) > @OutputTimeUnit(TimeUnit.NANOSECONDS) > @Warmup(iterations = 5, time = 1) > @Measurement(iterations = 5, time = 1) > @State(Scope.Benchmark) > public class PextPdepPerformanceBug { > // I'm not using constants to prevent constant folding > private long longValue; > private long longMask; > private int intValue; > private int intMask; > > @Setup(Level.Iteration) > public void setup() { > var rng = ThreadLocalRandom.current(); > longValue = rng.nextLong(); > longMask = rng.nextLong(); > intValue = rng.nextInt(); > intMask = rng.nextInt(); > } > > // Long.compress (PEXT 64-bit) > > @Benchmark > @Fork(value = 2, jvmArgsAppend = { > "-XX:+UnlockDiagnosticVMOptions", > "-XX:ControlIntrinsic=-_compress_l", > "-Xcomp" > }) > public long compressLongSoftware() { > return Long.compress(longValue, longMask); > } > > @Benchmark > @Fork(value = 2, jvmArgsAppend = { > "-XX:+UnlockDiagnosticVMOptions", > "-XX:ControlIntrinsic=+_compress_l", > "-Xcomp" > }) > public long compressLongIntrinsic() { > return Long.compress(longValue, longMask); > } > > // Long.expand (PDEP 64-bit) > > @Benchmark > @Fork(value = 2, jvmArgsAppend = { > "-XX:+UnlockDiagnosticVMOptions", > "-XX:ControlIntrinsic=-_expand_l", > "-Xcomp" > }) > public long expandLongSoftware() { > return Long.expand(longValue, longMask); > } > > @Benchmark > @Fork(value = 2, jvmArgsAppend = { > "-XX:+UnlockDiagnosticVMOptions", > "-XX:ControlIntrinsic=+_expand_l", > "-Xcomp" > }) > public long expandLongIntrinsic() { > return Long.expand(longValue, longMask); > } > > // Integer.compress (PEXT 32-bit) > > @Benchmark > @Fork(value = 2, jvmArgsAppend = { > "-XX:+UnlockDiagnosticVMOptions", > "-XX:ControlIntrinsic=-_compress_i", > "-Xcomp" > }) > public int compressIntSoftware() { > return Integer.compress(intValue, intMask); > } > > @Benchmark > @Fork(value = 2, jvmArgsAppend = { > "-XX:+UnlockDiagnosticVMOptions", > "-XX:ControlIntrinsic=+_compress_i", > "-Xcomp" > }) > public int compressIntIntrinsic() { > return Integer.compress(intValue, intMask); > } > > // Integer.expand (PDEP 32-bit) > > @Benchmark > @Fork(value = 2, jvmArgsAppend = { > "-XX:+UnlockDiagnosticVMOptions", > "-XX:ControlIntrinsic=-_expand_i", > "-Xcomp" > }) > public int expandIntSoftware() { > return Integer.expand(intValue, intMask); > } > > @Benchmark > @Fork(value = 2, jvmArgsAppend = { > "-XX:+UnlockDiagnosticVMOptions", > "-XX:ControlIntrinsic=+_expand_i", > "-Xcomp" > }) > public int expandIntIntrinsic() { > return Integer.expand(intValue, intMask); > } > } > ``` > > Here are the results on an i7 9700K, which supports the BMI2 instruction > set and is not affected by this issue: > ``` > Benchmark Mode Cnt Score Error > Units > PextPdepPerformanceBug.compressIntIntrinsic avgt 10 0,545 ? 0,002 > ns/op > PextPdepPerformanceBug.compressIntSoftware avgt 10 11,357 ? 0,033 > ns/op > PextPdepPerformanceBug.compressLongIntrinsic avgt 10 0,552 ? 0,012 > ns/op > PextPdepPerformanceBug.compressLongSoftware avgt 10 16,197 ? 0,203 > ns/op > PextPdepPerformanceBug.expandIntIntrinsic avgt 10 0,546 ? 0,006 > ns/op > PextPdepPerformanceBug.expandIntSoftware avgt 10 12,179 ? 0,457 > ns/op > PextPdepPerformanceBug.expandLongIntrinsic avgt 10 0,548 ? 0,018 > ns/op > PextPdepPerformanceBug.expandLongSoftware avgt 10 17,658 ? 0,534 > ns/op > ``` > > And here are the results on a Ryzen 7 2700, which supports the BMI2 > instruction set. but is also affected by this issue: > ``` > Benchmark Mode Cnt Score Error > Units > PextPdepPerformanceBug.compressIntIntrinsic avgt 10 28.010 ? 9.929 > ns/op > PextPdepPerformanceBug.compressIntSoftware avgt 10 20.008 ? 2.129 > ns/op > PextPdepPerformanceBug.compressLongIntrinsic avgt 10 48.999 ? 8.468 > ns/op > PextPdepPerformanceBug.compressLongSoftware avgt 10 28.638 ? 5.336 > ns/op > PextPdepPerformanceBug.expandIntIntrinsic avgt 10 24.860 ? 6.784 > ns/op > PextPdepPerformanceBug.expandIntSoftware avgt 10 19.277 ? 1.719 > ns/op > PextPdepPerformanceBug.expandLongIntrinsic avgt 10 43.889 ? 10.575 > ns/op > PextPdepPerformanceBug.expandLongSoftware avgt 10 27.350 ? 1.898 > ns/op > ``` > > **Precedent and Scope** > > A similar issue was reported in JDK-8334474 [4], where the compress/expand > intrinsics were disabled on RISC-V because the vectorized implementation > caused regressions compared to the pure-Java fallback. > This led me to investigate whether other JDK intrinsics relying on BMI2 > instructions might be affected. > The good news is that, as stated before, PEXT and PDEP are the only BMI2 > instructions that AMD implemented via microcode on pre-Zen 3 processors: > the others execute efficiently on all BMI2-capable hardware. > I also verified that no other JDK methods use PEXT/PDEP, so the four > methods covered in this report (Long.compress, Long.expand, > Integer.compress, Integer.expand) should be the only ones affected. > It's worth verifying this though as the JDK is very large and I could have > missed such examples. > > **Mitigation** > > The intrinsic selection logic should check both BMI2 support and CPU > vendor/family. > Specifically, disable these intrinsics when the CPU vendor is AMD and the > family is less than 0x19 (Zen 3). > I think this could be implemented in x86.ad [5], alongside the existing > BMI2 check, but I'm not familiar with C2's source code. > Still, I would be happy to work on this issue myself if the issue is > verified and it's acceptable for me to work on it. > > Thanks for reading! > > [1] https://bugs.openjdk.org/browse/JDK-8283893 > [2] https://developer.amd.com/resources/developer-guides-manuals/ > [3] https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set > [4] https://bugs.openjdk.org/browse/JDK-8334474 > [5] > https://github.com/jatin-bhateja/jdk/blob/7d35a283cf2497565d230e3d5426f563f7e5870d/src/hotspot/cpu/x86/x86.ad#L3183 -- Galder Zamarre?o Software Developer IBM Software galder at ibm.com IBM From rcastanedalo at openjdk.org Fri Dec 12 15:40:47 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 Dec 2025 15:40:47 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v7] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 16:10:58 GMT, Roland Westrelin wrote: >> The test case has an out of loop `Store` with an `AddP` address >> expression that has other uses and is in the loop body. Schematically, >> only showing the address subgraph and the bases for the `AddP`s: >> >> >> Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> CastPP#110 >> >> >> Both `AddP`s have the same base, a `CastPP` that's also in the loop >> body. >> >> That loop is a counted loop and only has 3 iterations so is fully >> unrolled. First, one iteration is peeled: >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> The `AddP`s and `CastPP` are cloned (because in the loop body). As >> part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is >> called. It finds the test that guards `CastPP#283` in the peeled >> iteration dominates and replaces the test that guards `CastPP#110` >> (the test in the peeled iteration is the clone of the test in the >> loop). That causes `CastPP#110`'s control to be updated to that of the >> test in the peeled iteration and to be yanked from the loop. So now >> `CastPP#283` and `CastPP#110` have the same inputs. >> >> Next unrolling happens: >> >> >> /-> CastPP#110 >> /-> AddP#400 -> AddP#401 -> CastPP#110 >> Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 >> \ -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> `AddP`s are cloned once more but not the `CastPP`s because they are >> both in the peeled iteration now. A new `Phi` is added. >> >> Next igvn runs. It's going to push the `AddP`s through the `Phi`s. >> >> Through `Phi#477`: >> >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 >> \ -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> Through `Phi#360`: >> >> >> /-> AddP#134 -> CastPP#110 >> /-> Phi#509 -> AddP#401 -> CastPP#110 >> Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 >> -> Phi#514 -> CastPP#283 >> ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8351889 > - Update src/hotspot/share/opto/phaseX.hpp > > Co-authored-by: Roberto Casta?eda Lozano > - Update src/hotspot/share/opto/phaseX.cpp > > Co-authored-by: Roberto Casta?eda Lozano > - review > - more > - review > - Merge branch 'master' into JDK-8351889 > - exp > - Merge branch 'master' into JDK-8351889 > - ... and 9 more: https://git.openjdk.org/jdk/compare/fc520c13...100fad3d Looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25386#pullrequestreview-3572383700 From bkilambi at openjdk.org Fri Dec 12 15:45:29 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 12 Dec 2025 15:45:29 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <4VXHOCR1YSoMVbDbB8j-j18Z-_VbO0y5fJfyp3IjQ9c=.19485011-9cb3-4016-a642-61cee81adcd1@github.com> Message-ID: On Tue, 7 Oct 2025 02:47:50 GMT, Xiaohong Gong wrote: >> Are you referring to the N1 numbers? The add reduction operation has gains around ~40% while the mul reduction is around ~20% on N1. On V1 and V2 they look comparable (not considering the cases where we generate `fadda` instructions for add reduction). >> >>> Seems instructions between different ins instructions will have a data-dependence, which is not expected >> >> Why do you think it's not expected? We have the exact same sequence for Neon add reduction as well. There's back to back dependency there as well and yet it shows better performance. The N1 optimization guide shows 2 cyc latency for `fadd` and 3 cyc latency for `fmul`. Could this be the reason? WDYT? > > I mean we do not expect there is data-dependence between two `ins` operations, but it has now. We do not recommend use the instructions that just write part of a register. This might involve un-expected dependence between. I suggest to use `ext` instead, and I can observe about 20% performance improvement compared with current version on V2. I did not check the correctness, but it looks right to me. Could you please help check on other machines? Thanks! > > The change might look like: > Suggestion: > > fmulh(dst, fsrc, vsrc); > ext(vtmp, T8B, vsrc, vsrc, 2); > fmulh(dst, dst, vtmp); > ext(vtmp, T8B, vsrc, vsrc, 4); > fmulh(dst, dst, vtmp); > ext(vtmp, T8B, vsrc, vsrc, 6); > fmulh(dst, dst, vtmp); > if (isQ) { > ext(vtmp, T16B, vsrc, vsrc, 8); > fmulh(dst, dst, vtmp); > ext(vtmp, T16B, vsrc, vsrc, 10); > fmulh(dst, dst, vtmp); > ext(vtmp, T16B, vsrc, vsrc, 12); > fmulh(dst, dst, vtmp); > ext(vtmp, T16B, vsrc, vsrc, 14); > fmulh(dst, dst, vtmp); Hi @XiaohongGong Thanks for this suggestion. I understand that `ins` has a read-modify-write dependency while `ext` does not have that as we are not reading the `vtmp` register in this case. I made changes to both the add and mul reduction implementation and I could see some perf gains on V1 and V2 for mul reduction - Benchmark | vectorDim | 8B | 16B -- | -- | -- | -- Float16OperationsBenchmark.ReductionAddFP16 | 256 | 1.0022509 | 0.99938584 Float16OperationsBenchmark.ReductionAddFP16 | 512 | 1.05157946 | 1.00262025 Float16OperationsBenchmark.ReductionAddFP16 | 1024 | 1.02392196 | 1.00187924 Float16OperationsBenchmark.ReductionAddFP16 | 2048 | 1.01219315 | 0.99964493 Float16OperationsBenchmark.ReductionMulFP16 | 256 | 0.99729809 | 1.19006546 Float16OperationsBenchmark.ReductionMulFP16 | 512 | 1.03897347 | 1.0689105 Float16OperationsBenchmark.ReductionMulFP16 | 1024 | 1.01822982 | 1.01509971 Float16OperationsBenchmark.ReductionMulFP16 | 2048 | 1.0086255 | 1.0032434 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2614674991 From rcastanedalo at openjdk.org Fri Dec 12 15:48:24 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 Dec 2025 15:48:24 GMT Subject: RFR: 8373577: C2: Cleanup adr_type of CallLeafPureNode In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 11:09:36 GMT, Quan Anh Mai wrote: > Hi, > > This PR is extracted from #28570 , `CallLeafPureNode`s do not read from or write to memory, so their `adr_type` should be `nullptr`. > > Please take a look and leave your reviews, thanks a lot. Thanks for doing this! I reported [JDK-8367667](https://bugs.openjdk.org/browse/JDK-8367667) some time ago for adding verification code that checks address type consistency, so that we avoid introducing inconsistencies like this one. Feel free to grab it if you are interested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28786#issuecomment-3647093880 From qamai at openjdk.org Fri Dec 12 15:59:16 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Dec 2025 15:59:16 GMT Subject: RFR: 8373577: C2: Cleanup adr_type of CallLeafPureNode In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 15:46:08 GMT, Roberto Casta?eda Lozano wrote: >> Hi, >> >> This PR is extracted from #28570 , `CallLeafPureNode`s do not read from or write to memory, so their `adr_type` should be `nullptr`. >> >> Please take a look and leave your reviews, thanks a lot. > > Thanks for doing this! I reported [JDK-8367667](https://bugs.openjdk.org/browse/JDK-8367667) some time ago for adding verification code that checks address type consistency, so that we avoid introducing inconsistencies like this one. Feel free to grab it if you are interested. @robcasloz Thanks for taking a look at this PR, you may want to check #28570 where I am trying to disambiguate `Node::adr_type` and insert verification for the memory graph. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28786#issuecomment-3647146124 From roland at openjdk.org Fri Dec 12 16:01:17 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Dec 2025 16:01:17 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v7] In-Reply-To: References: Message-ID: > The crash occurs because verification code expects the inner and outer > loop of a loop strip mining nest to have the same number of phis but, > in this case, the inner loop has one more memory phis than the outer > loop. > > 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and > outer loops have the same number of phis, as expected. > > > 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] > 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > > 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 > through the outer loop phi: > > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTe... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: IR test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28677/files - new: https://git.openjdk.org/jdk/pull/28677/files/386b63d3..17b36426 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28677&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28677&range=05-06 Stats: 71 lines in 1 file changed: 71 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28677/head:pull/28677 PR: https://git.openjdk.org/jdk/pull/28677 From roland at openjdk.org Fri Dec 12 16:05:39 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Dec 2025 16:05:39 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: <8ZNxxtdCUYYVmqq8Lam8C9sI7xh9qqHU_5j16ef6S0Q=.d793a57b-bc4c-4ed5-acba-ebf1bd372d4b@github.com> References: <8ZNxxtdCUYYVmqq8Lam8C9sI7xh9qqHU_5j16ef6S0Q=.d793a57b-bc4c-4ed5-acba-ebf1bd372d4b@github.com> Message-ID: On Fri, 12 Dec 2025 13:11:57 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update src/hotspot/share/opto/node.cpp >> >> Co-authored-by: Daniel Lund?n >> - Update test/hotspot/jtreg/compiler/loopstripmining/TestMismatchedMemoryPhis.java >> >> Co-authored-by: Daniel Lund?n >> - Update src/hotspot/share/opto/cfgnode.cpp >> >> Co-authored-by: Daniel Lund?n > > src/hotspot/share/opto/cfgnode.cpp line 2701: > >> 2699: } >> 2700: } >> 2701: } > > Another drive-by question: > You are refactoring / fixing existing optimizations: > Are there IR tests that cover the original optimization? How do we avoid that we lose optimizations here? None was integrated with the initial change. I added one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2614768042 From roland at openjdk.org Fri Dec 12 16:05:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Dec 2025 16:05:41 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: References: <8ZNxxtdCUYYVmqq8Lam8C9sI7xh9qqHU_5j16ef6S0Q=.d793a57b-bc4c-4ed5-acba-ebf1bd372d4b@github.com> Message-ID: On Fri, 12 Dec 2025 14:56:30 GMT, Emanuel Peter wrote: >> Failure doesn't reproduce without it. > > So `iArrFld[]` must be null for this to reproduce? I tried to reproduce it without it but couldn't. I'm not sure why the exception handling code is needed but it seems it is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2614764706 From qamai at openjdk.org Fri Dec 12 16:06:22 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Dec 2025 16:06:22 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v2] In-Reply-To: References: Message-ID: > Hi, > > This is extracted from #28570 , there are 2 issues here: > > - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. > - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. > > Please kindly review, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Fix Shenandoah ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28789/files - new: https://git.openjdk.org/jdk/pull/28789/files/e9789170..1e026354 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28789&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28789&range=00-01 Stats: 10 lines in 1 file changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28789.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28789/head:pull/28789 PR: https://git.openjdk.org/jdk/pull/28789 From galder at openjdk.org Fri Dec 12 16:47:52 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 12 Dec 2025 16:47:52 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations In-Reply-To: References: Message-ID: <0CMbZ6hX2pzB7wrwyWkBaPNmE7M12L8RCKre7ejRh-c=.762930c3-0f9b-41ba-b5ee-53e56ce7bfe6@github.com> On Fri, 12 Dec 2025 11:24:09 GMT, Emanuel Peter wrote: >> `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. >> >> The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. >> >> Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. >> >> If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?): >> >> >> // MinINode::Ideal >> // Did not investigate, but there are some patterns that might >> // need more notification. >> case Op_MinI: >> case Op_MaxI: // preemptively removed it as well. >> return false; >> >> >> I've run tier1-3 tests on linux/x64 and they passed. > > test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdeal.java line 30: > >> 28: * @modules java.base/jdk.internal.misc >> 29: * @library /test/lib / >> 30: * @run driver compiler.c2.irTests.TestMinMaxIdeal > > Suggestion: > > * @run driver ${test.main.class} > > Also: please don't put any new tests in `irTests`. Rather put it in a directory based on the topic. I did think about that but then I saw `TestMinMaxIdentities` was on the package so thought of adding it next to it. What about putting it in `compiler.intrinsics.math` instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28770#discussion_r2614909695 From galder at openjdk.org Fri Dec 12 16:54:51 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 12 Dec 2025 16:54:51 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 16:02:24 GMT, Galder Zamarre?o wrote: > `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. > > The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. > > Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. > > If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?): > > > // MinINode::Ideal > // Did not investigate, but there are some patterns that might > // need more notification. > case Op_MinI: > case Op_MaxI: // preemptively removed it as well. > return false; > > > I've run tier1-3 tests on linux/x64 and they passed. @TobiHartmann Thanks for the review! @eme64 Thanks also for the review. Can you please also clarify what I said about potentially changing `PhaseIterGVN::verify_Ideal_for` in the description? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28770#issuecomment-3647351269 From galder at openjdk.org Fri Dec 12 16:54:54 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 12 Dec 2025 16:54:54 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 11:25:17 GMT, Emanuel Peter wrote: >> `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. >> >> The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. >> >> Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. >> >> If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?): >> >> >> // MinINode::Ideal >> // Did not investigate, but there are some patterns that might >> // need more notification. >> case Op_MinI: >> case Op_MaxI: // preemptively removed it as well. >> return false; >> >> >> I've run tier1-3 tests on linux/x64 and they passed. > > test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdeal.java line 56: > >> 54: String templatedPackage ="compiler.c2.templated"; >> 55: String templatedClassName ="MinMaxIdeal"; >> 56: String templatedFQN = "%s.%s".formatted(templatedPackage, templatedClassName); > > That looks a bit convoluted. Why not just use the final string? Not sure exactly what you mean. For `addJavaSourceCode` I need the combined FQN and for `TestFrameworkClass.render` I need the classname and package separated. > test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdeal.java line 74: > >> 72: testTemplateTokens.add(new TestGenerator(Op.MAX_I).generate()); >> 73: testTemplateTokens.add(new TestGenerator(Op.MIN_L).generate()); >> 74: testTemplateTokens.add(new TestGenerator(Op.MAX_L).generate()); > > Why not use `Op.values()` -> `List`, then iterate over that? Yeah, better indeed. It avoids issues adding an enum value and forgetting to add it (I already did that lol) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28770#discussion_r2614919571 PR Review Comment: https://git.openjdk.org/jdk/pull/28770#discussion_r2614923801 From epeter at openjdk.org Fri Dec 12 17:03:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 17:03:52 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 14:54:23 GMT, Galder Zamarre?o wrote: >> We should test `Float16` with Template Framework Tests. For this, I'm now implementing: >> >> - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. >> - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. >> - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. > > test/hotspot/jtreg/testlibrary_tests/verify/tests/TestVerifyIncubatorVector.java line 44: > >> 42: import compiler.lib.verify.*; >> 43: >> 44: public class TestVerifyIncubatorVector { > > I have doubts about leaving the "Incubator" name in the test class name as it's temporary. Are you going to refactor the class name when API is not incubator any more? Maybe `TestVerifyVectorAPI` instead? Yes, I might refactor it away actually. We could also remove the special handling in verify for foreign classes. Currently I have to do the hack with reflection. I'd like to get rid of that. This test is here for the cases where we need to include foreign, and because we need `@modules jdk.incubator.vector`. There is the other test that already exists and does not need this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28095#discussion_r2614960491 From epeter at openjdk.org Fri Dec 12 17:13:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 17:13:01 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: References: <8ZNxxtdCUYYVmqq8Lam8C9sI7xh9qqHU_5j16ef6S0Q=.d793a57b-bc4c-4ed5-acba-ebf1bd372d4b@github.com> Message-ID: On Fri, 12 Dec 2025 16:02:44 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/cfgnode.cpp line 2701: >> >>> 2699: } >>> 2700: } >>> 2701: } >> >> Another drive-by question: >> You are refactoring / fixing existing optimizations: >> Are there IR tests that cover the original optimization? How do we avoid that we lose optimizations here? > > None was integrated with the initial change. I added one. And how confident are you that this one test ensures there won't be a regression? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2614987642 From epeter at openjdk.org Fri Dec 12 17:13:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 17:13:02 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: References: <8ZNxxtdCUYYVmqq8Lam8C9sI7xh9qqHU_5j16ef6S0Q=.d793a57b-bc4c-4ed5-acba-ebf1bd372d4b@github.com> Message-ID: On Fri, 12 Dec 2025 16:01:59 GMT, Roland Westrelin wrote: >> So `iArrFld[]` must be null for this to reproduce? > > I tried to reproduce it without it but couldn't. I'm not sure why the exception handling code is needed but it seems it is. But the `main` method is not even compiled, right? Do we ever deopt, and then recompile maybe? I suppose don't worry about it too much. I just don't like random catch statements, because they could hide bugs down the line. But it is a very slim chance that this would happen anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2614985150 From epeter at openjdk.org Fri Dec 12 17:16:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 17:16:51 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 16:51:25 GMT, Galder Zamarre?o wrote: > @TobiHartmann Thanks for the review! > > @eme64 Thanks also for the review. Can you please also clarify what I said about potentially changing `PhaseIterGVN::verify_Ideal_for` in the description? I don't remember. Try enabling the verification, and see if you find any test that fails. If not: great, maybe you fixed it! If it still fails, it would be nice if you added more info, but not neccessary. I don't remember because there were eventually too many cases and I stopped reporting which had failed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28770#issuecomment-3647428278 From epeter at openjdk.org Fri Dec 12 17:16:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Dec 2025 17:16:54 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations In-Reply-To: <0CMbZ6hX2pzB7wrwyWkBaPNmE7M12L8RCKre7ejRh-c=.762930c3-0f9b-41ba-b5ee-53e56ce7bfe6@github.com> References: <0CMbZ6hX2pzB7wrwyWkBaPNmE7M12L8RCKre7ejRh-c=.762930c3-0f9b-41ba-b5ee-53e56ce7bfe6@github.com> Message-ID: On Fri, 12 Dec 2025 16:45:31 GMT, Galder Zamarre?o wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdeal.java line 30: >> >>> 28: * @modules java.base/jdk.internal.misc >>> 29: * @library /test/lib / >>> 30: * @run driver compiler.c2.irTests.TestMinMaxIdeal >> >> Suggestion: >> >> * @run driver ${test.main.class} >> >> Also: please don't put any new tests in `irTests`. Rather put it in a directory based on the topic. > > I did think about that but then I saw `TestMinMaxIdentities` was on the package so thought of adding it next to it. What about putting it in `compiler.intrinsics.math` instead? I think it belongs more under idealization. So either a `gvn` or `igvn` directory. >> test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdeal.java line 56: >> >>> 54: String templatedPackage ="compiler.c2.templated"; >>> 55: String templatedClassName ="MinMaxIdeal"; >>> 56: String templatedFQN = "%s.%s".formatted(templatedPackage, templatedClassName); >> >> That looks a bit convoluted. Why not just use the final string? > > Not sure exactly what you mean. For `addJavaSourceCode` I need the combined FQN and for `TestFrameworkClass.render` I need the classname and package separated. It is just a nit, I leave it to you. I'm also fine taking it as is :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28770#discussion_r2614991145 PR Review Comment: https://git.openjdk.org/jdk/pull/28770#discussion_r2614992970 From dfenacci at openjdk.org Fri Dec 12 17:48:29 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 12 Dec 2025 17:48:29 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel [v5] In-Reply-To: References: Message-ID: > ## Issue > Today, the only practical ways to run IR Framework scenarios in parallel seems to be: > * spawning threads manually in a single test, or > * letting jtreg treat each scenario as a separate test (the only way to potentially distribute across hosts). > > This makes it a bit cumbersome to use host CPU cores efficiently when running multiple scenarios within the same test. > > ## Change > This change introduces a method `TestFramework::startParallel` to execute multiple scenarios concurrently. The implementation: > * launches one task per scenario and runs them in parallel (by default, the maximum concurrency should match the host?s available cores) > * captures each task?s `System.out` into a dedicated buffer and flushes it when the task completes to avoid interleaved output between scenarios (Note: only call paths within the `compile.lib.ir_framework` package are modified to per-task output streams. `ProcessTools` methods still write directly to `stdout`, so their output may interleave). > * adds an option `-DForceSequentialScenarios=true` to force all scenarios to be run sequentially. > > ## Testing > * Tier 1-3+ > * explicit `ir_framework.tests` runs > * added IR-Framework test `TestDForceSequentialScenarios.java` to test forcing sequential testing (checkin the output order) and added a parallel run to `TestScenatios.java` (as well as adding `ForceSequentialScenarios` flag to `TestDFlags.java`) > > As reference: a comparison of the execution time between sequential and parallel of all IR-Framework tests using scenarios on our machines (linux x64/aarch64, macosx x64/aarch64, windows x64 with different number of cores, so the results for a single test might not be relevant) gave me an average speedup of 1.9. Damon Fenacci has updated the pull request incrementally with five additional commits since the last revision: - Update test/hotspot/jtreg/compiler/lib/ir_framework/shared/TestFormat.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestScenarios.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28065/files - new: https://git.openjdk.org/jdk/pull/28065/files/7b643833..203c32d2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28065&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28065&range=03-04 Stats: 5 lines in 3 files changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28065/head:pull/28065 PR: https://git.openjdk.org/jdk/pull/28065 From dfenacci at openjdk.org Fri Dec 12 17:48:31 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 12 Dec 2025 17:48:31 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel [v4] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:20:58 GMT, Christian Hagedorn wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8370315: fix typo > > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 775: > >> 773: if (!output.isEmpty()) { >> 774: System.out.println(output); >> 775: } > > We probably also need to do a similar trick as for the exceptions in order to have ordered stdouts for the scenarios? That would be a good idea. Adding it... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2615078572 From dfenacci at openjdk.org Fri Dec 12 17:52:38 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 12 Dec 2025 17:52:38 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel [v6] In-Reply-To: References: Message-ID: > ## Issue > Today, the only practical ways to run IR Framework scenarios in parallel seems to be: > * spawning threads manually in a single test, or > * letting jtreg treat each scenario as a separate test (the only way to potentially distribute across hosts). > > This makes it a bit cumbersome to use host CPU cores efficiently when running multiple scenarios within the same test. > > ## Change > This change introduces a method `TestFramework::startParallel` to execute multiple scenarios concurrently. The implementation: > * launches one task per scenario and runs them in parallel (by default, the maximum concurrency should match the host?s available cores) > * captures each task?s `System.out` into a dedicated buffer and flushes it when the task completes to avoid interleaved output between scenarios (Note: only call paths within the `compile.lib.ir_framework` package are modified to per-task output streams. `ProcessTools` methods still write directly to `stdout`, so their output may interleave). > * adds an option `-DForceSequentialScenarios=true` to force all scenarios to be run sequentially. > > ## Testing > * Tier 1-3+ > * explicit `ir_framework.tests` runs > * added IR-Framework test `TestDForceSequentialScenarios.java` to test forcing sequential testing (checkin the output order) and added a parallel run to `TestScenatios.java` (as well as adding `ForceSequentialScenarios` flag to `TestDFlags.java`) > > As reference: a comparison of the execution time between sequential and parallel of all IR-Framework tests using scenarios on our machines (linux x64/aarch64, macosx x64/aarch64, windows x64 with different number of cores, so the results for a single test might not be relevant) gave me an average speedup of 1.9. Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8370315: syntax ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28065/files - new: https://git.openjdk.org/jdk/pull/28065/files/203c32d2..74b51fd6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28065&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28065&range=04-05 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/28065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28065/head:pull/28065 PR: https://git.openjdk.org/jdk/pull/28065 From dfenacci at openjdk.org Fri Dec 12 17:52:39 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 12 Dec 2025 17:52:39 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel [v4] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:31:27 GMT, Christian Hagedorn wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8370315: fix typo > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestBadFormat.java line 128: > >> 126: private static void expectTestFormatException(Class clazz, Class... helpers) { >> 127: // Single test >> 128: boolean exceptionCatched = false; > > Nit: catched -> caught Oops. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2615089137 From dfenacci at openjdk.org Fri Dec 12 17:59:13 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 12 Dec 2025 17:59:13 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel [v7] In-Reply-To: References: Message-ID: > ## Issue > Today, the only practical ways to run IR Framework scenarios in parallel seems to be: > * spawning threads manually in a single test, or > * letting jtreg treat each scenario as a separate test (the only way to potentially distribute across hosts). > > This makes it a bit cumbersome to use host CPU cores efficiently when running multiple scenarios within the same test. > > ## Change > This change introduces a method `TestFramework::startParallel` to execute multiple scenarios concurrently. The implementation: > * launches one task per scenario and runs them in parallel (by default, the maximum concurrency should match the host?s available cores) > * captures each task?s `System.out` into a dedicated buffer and flushes it when the task completes to avoid interleaved output between scenarios (Note: only call paths within the `compile.lib.ir_framework` package are modified to per-task output streams. `ProcessTools` methods still write directly to `stdout`, so their output may interleave). > * adds an option `-DForceSequentialScenarios=true` to force all scenarios to be run sequentially. > > ## Testing > * Tier 1-3+ > * explicit `ir_framework.tests` runs > * added IR-Framework test `TestDForceSequentialScenarios.java` to test forcing sequential testing (checkin the output order) and added a parallel run to `TestScenatios.java` (as well as adding `ForceSequentialScenarios` flag to `TestDFlags.java`) > > As reference: a comparison of the execution time between sequential and parallel of all IR-Framework tests using scenarios on our machines (linux x64/aarch64, macosx x64/aarch64, windows x64 with different number of cores, so the results for a single test might not be relevant) gave me an average speedup of 1.9. Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8370315: Remove whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28065/files - new: https://git.openjdk.org/jdk/pull/28065/files/74b51fd6..f3e51d50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28065&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28065&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28065/head:pull/28065 PR: https://git.openjdk.org/jdk/pull/28065 From bmaillard at openjdk.org Fri Dec 12 18:38:52 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 12 Dec 2025 18:38:52 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 22:23:28 GMT, Emanuel Peter wrote: > We should test `Float16` with Template Framework Tests. For this, I'm now implementing: > > - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. > - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. > - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. Looks good to me, nice work! I only have one question. test/hotspot/jtreg/compiler/igvn/ExpressionFuzzer.java line 343: > 341: // Generate expressions with any scalar numeric types. > 342: for (CodeGenerationDataNameType type : SCALAR_NUMERIC_TYPES) { > 343: for (int i = 0; i < 2; i++) { What does this loop do? And why do we have only 2 iterations here, but 10 for `PRIMITIVE_TYPES`? ------------- Marked as reviewed by bmaillard (Committer). PR Review: https://git.openjdk.org/jdk/pull/28095#pullrequestreview-3572915843 PR Review Comment: https://git.openjdk.org/jdk/pull/28095#discussion_r2615052804 From bmaillard at openjdk.org Fri Dec 12 18:56:18 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 12 Dec 2025 18:56:18 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v4] In-Reply-To: References: Message-ID: > This PR addresses a missed optimization in `PhaseIterGVN` for `MemBarAcquire` nodes caused by a missing notification during parsing. > > The missed optimization in question is the removal of the the `in(MemBarNode::Precedent)` edge for > `MemBarAcquire` nodes when the the `MemBar` is the only user of its input. This was intially > introduced to get rid of unused `Load` nodes that were only kept alive by such an edge. > > > > https://github.com/openjdk/jdk/blob/eeb7c3f2e8e645938d9db0cf61c1d98d751f2845/src/hotspot/share/opto/memnode.cpp#L4254-L4259 > > In our case, it happens that the `Load` node gets folded to a constant during the initial > `_gvn.transform` call in `GraphKit::make_load`. Because the value is converted before being > returned, we end up with two constant nodes: one `ConL` and one `ConI`. The `ConL` only > has one usage, and this triggers the optimization during verification. > > > static int test0() { > var c = new MyClass(); > // the conversion ensures that the ConL node only has one use > // in the end, which triggers the optimization > return (int) c.l; > } > > > The optimization is not triggered earlier during when we apply `_gvn.transform` on the membar, > because it requires `can_reshape`, which is set to `false` in when we call `apply_ideal` in > `PhaseGVN::transform`. > > For this reason, we should call `record_for_igvn(membar)` after the `MemBar` is created > and transformed in `GraphKit::insert_mem_bar` to make sure it gets an `Ideal` pass with > `can_reshape` later. > > > This issue was initially filed for Valhalla, because a condition in `GraphKit::make_load` > prevents its from occurring when boxing elimination is enabled. Boxing elimination is > disabled temporarily in Valhalla (see [JDK-8328675](https://bugs.openjdk.org/browse/JDK-8328675)), > which caused the issue to appear, but by using `-XX:-EliminateAutoBox` it became clear > that the issue was on mainline. > > ### Testing > - [x] [GitHub Actions](TODO) > - [x] tier1-3, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Move to igvn directory and use test.main.class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28448/files - new: https://git.openjdk.org/jdk/pull/28448/files/be428cb3..a32ee08c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28448&range=02-03 Stats: 184 lines in 2 files changed: 92 ins; 92 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28448.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28448/head:pull/28448 PR: https://git.openjdk.org/jdk/pull/28448 From bmaillard at openjdk.org Fri Dec 12 18:56:21 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 12 Dec 2025 18:56:21 GMT Subject: RFR: 8367627: C2: Missed Ideal() optimization opportunity with MemBar [v3] In-Reply-To: <4JnkUEx4O7We0dTeMUzQa3pFdRaSJoWHrXlhB3wFc0M=.07e9b81b-187d-4e72-b7e3-492c5cda0e0c@github.com> References: <4JnkUEx4O7We0dTeMUzQa3pFdRaSJoWHrXlhB3wFc0M=.07e9b81b-187d-4e72-b7e3-492c5cda0e0c@github.com> Message-ID: <8fekfXtLDXeb2ratn8pcaehKLjzRBd9wa0xkzN0KTu4=.b09b0a5a-e4e1-480d-97a7-532ae3fbb6cf@github.com> On Wed, 10 Dec 2025 11:44:31 GMT, Emanuel Peter wrote: >> Beno?t Maillard has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8367627 >> - Add notification in Node::has_special_unique_user >> - Add run with -XX:+AlwaysIncrementalInline, and add intermediate run for -XX:-DoEscapeAnalysis >> - Record in GraphKit::insert_mem_bar_volatile for consistency >> - Improve test and fix >> - Add test > > test/hotspot/jtreg/compiler/c2/igvn/TestMissingOptMemBarRemovePrecedentEdge.java line 2: > >> (failed to retrieve contents of file, check the PR for context) > Should the test go into an `igvn` directory? Or something else a bit more specific? Moved it to `compiler/c2/igvn` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28448#discussion_r2615250843 From bmaillard at openjdk.org Fri Dec 12 18:59:22 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 12 Dec 2025 18:59:22 GMT Subject: RFR: 8372302: C2: IGVN verification fails because ModXNode::Ideal creates unused intermediate nodes [v2] In-Reply-To: References: Message-ID: > This PR addresses a failure in IGVN verification with `ModI` and `ModL` nodes. > > In `ModeXNode::Ideal`, we have code to optimize a modulo expression by expressing it in terms of other operations. There are actually two distinct cases, one where the divisor is a constant and is equal to `modulo 2^k-1` for some integer `k`, and a more general case where other transformations do not succeed. Because these transformations involve creating several new nodes (sometimes in a loop) and calling `phase->transform(...)` on them, we want to avoid accidentally triggering optimizations on the "unfinished" state of the subgraph. For this, we create a temporary dummy node and add edges to the nodes being constructed. > > There are some execution paths where the node is not destroyed before `Ideal` returns, and this creates issues during IGVN verification, as the verification code checks if the number of nodes has changed after having called `Ideal` on a given node and not expecting changes. > > The path in question is when we exit because the divisor is a constant and is the minimum value: > https://github.com/openjdk/jdk/blob/c19b12927d2ac901ec8ccaa2de5897ee4c47af56/src/hotspot/share/opto/divnode.cpp#L1146-L1147 > > The zero case does not cause problems (this seems to be because it would hide behind a `div0_check` anyway). > > The fix is simply to only create the temporary node when it is needed, and thus avoiding returning without destroying it. > > ### Testing > - [x] [GitHub Actions](TODO) > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/c2/TestModIdealCreatesUselessNode.java Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28488/files - new: https://git.openjdk.org/jdk/pull/28488/files/3b515aac..7d57f7ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28488&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28488&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28488.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28488/head:pull/28488 PR: https://git.openjdk.org/jdk/pull/28488 From bmaillard at openjdk.org Fri Dec 12 19:05:15 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 12 Dec 2025 19:05:15 GMT Subject: RFR: 8372302: C2: IGVN verification fails because ModXNode::Ideal creates unused intermediate nodes [v3] In-Reply-To: References: Message-ID: > This PR addresses a failure in IGVN verification with `ModI` and `ModL` nodes. > > In `ModeXNode::Ideal`, we have code to optimize a modulo expression by expressing it in terms of other operations. There are actually two distinct cases, one where the divisor is a constant and is equal to `modulo 2^k-1` for some integer `k`, and a more general case where other transformations do not succeed. Because these transformations involve creating several new nodes (sometimes in a loop) and calling `phase->transform(...)` on them, we want to avoid accidentally triggering optimizations on the "unfinished" state of the subgraph. For this, we create a temporary dummy node and add edges to the nodes being constructed. > > There are some execution paths where the node is not destroyed before `Ideal` returns, and this creates issues during IGVN verification, as the verification code checks if the number of nodes has changed after having called `Ideal` on a given node and not expecting changes. > > The path in question is when we exit because the divisor is a constant and is the minimum value: > https://github.com/openjdk/jdk/blob/c19b12927d2ac901ec8ccaa2de5897ee4c47af56/src/hotspot/share/opto/divnode.cpp#L1146-L1147 > > The zero case does not cause problems (this seems to be because it would hide behind a `div0_check` anyway). > > The fix is simply to only create the temporary node when it is needed, and thus avoiding returning without destroying it. > > ### Testing > - [x] [GitHub Actions](TODO) > - [x] tier1-4, plus some internal testing > > Thank you for reviewing! Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: - Update package - Move to compiler/c2/igvn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28488/files - new: https://git.openjdk.org/jdk/pull/28488/files/7d57f7ae..13758bef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28488&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28488&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28488.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28488/head:pull/28488 PR: https://git.openjdk.org/jdk/pull/28488 From bmaillard at openjdk.org Fri Dec 12 19:05:17 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 12 Dec 2025 19:05:17 GMT Subject: RFR: 8372302: C2: IGVN verification fails because ModXNode::Ideal creates unused intermediate nodes [v3] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 13:23:58 GMT, Emanuel Peter wrote: >> Beno?t Maillard has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update package >> - Move to compiler/c2/igvn > > test/hotspot/jtreg/compiler/c2/TestModIdealCreatesUselessNode.java line 24: > >> 22: */ >> 23: >> 24: package compiler.c2; > > Could we find some more specific igvn directory? Moved to `compiler.c2.igvn` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28488#discussion_r2615275282 From kvn at openjdk.org Fri Dec 12 19:46:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Dec 2025 19:46:54 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v4] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Fri, 12 Dec 2025 02:30:05 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/phaseloadfolding.cpp line 94: >> >>> 92: // We can see that the object can be considered non-escape at NarrowMemProj, CallJava(null), and >>> 93: // Proj2, while it is considered escape at CallJava(o), Proj1, Phi. The loads x and z will be >>> 94: // from NarrowMemProj and Proj2, respectively, which means they can be considered loads from an >> >> So this optimization is based on JDK-8327963 changes which introduced NarrowMemProj. But I don't see you can for it in code. > > This is only for demonstration based on the current shape of the graph. Implementation-wise, we walk the graph until we meet an `InitializeNode`, at that point we call `InitializeNode::find_captured_store`, so you can say it is not important what kind of `Proj` an `InitializeNode` has. Okay ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28764#discussion_r2615381209 From kvn at openjdk.org Fri Dec 12 19:51:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Dec 2025 19:51:59 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v5] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Fri, 12 Dec 2025 03:56:47 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> more detailed explanations > > The sufficient condition to decide that a freshly allocated object does not escape in a region bounded by the allocation and a call is that there is no action in that region that makes the object escape. This means that there is no node that escapes the object which has the call as a transitive use. > > As a result, my solution here is to find all nodes that escape the object, then mark all of its transitive uses as escape. I believe you want to do it in the opposite way, that is, to try to find the nodes that escape the freshly allocated object from a call. But that means that we need to traverse all the transitive inputs of the call, which seems unrealistic for something running in `IterGVN`. Am I understanding it correctly? @merykitty thank you for updating comment. Do you have any performance numbers for some well known benchmarks? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3647917679 From vlivanov at openjdk.org Fri Dec 12 20:21:54 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 12 Dec 2025 20:21:54 GMT Subject: RFR: 8368977: Provide clear naming for AVX10 identifiers [v2] In-Reply-To: References: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> Message-ID: On Mon, 8 Dec 2025 21:47:16 GMT, Mohamed Issa wrote: >> This is a simple change that renames all AVX10 identifiers to explicitly show which sub-versions are in use. Right now, AVX10.2 is the only case to worry about. The JTREG tests listed below were used to verify correctness with the recommended JVM options mentioned in corresponding source files. Each test included runs through emulation with AVX10.2 enabled and disabled to exercise all possible paths. All modifications and tests used [OpenJDK v26-b24](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B24) as the baseline build. >> >> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` >> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` >> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` >> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` >> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` >> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` >> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` >> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` >> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java` >> 10. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java` >> 11. `jtreg:test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java` >> 12. `jtreg:test/jdk/jdk/incubator/vector/Double64VectorTests.java` >> 13. `jtreg:test/jdk/jdk/incubator/vector/Double128VectorTests.java` >> 14. `jtreg:test/jdk/jdk/incubator/vector/Double256VectorTests.java` >> 15. `jtreg:test/jdk/jdk/incubator/vector/Double512VectorTests.java` >> 16. `jtreg:test/jdk/jdk/incubator/vector/DoubleMaxVectorTests.java` >> 17. `jtreg:test/jdk/jdk/incubator/vector/Float64VectorTests.java` >> 18. `jtreg:test/jdk/jdk/incubator/vector/Float128VectorTests.java` >> 19. `jtreg:test/jdk/jdk/incubator/vector/Float256VectorTests.java` >> 20. `jtreg:test/jdk/jdk/incubator/vector/Float512VectorTests.java` >> 21. `jtreg:test/jdk/jdk/incubator/vector/FloatMaxVectorTests.java` >> 22. `jtreg:test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` >> 23. `jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java` > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Remove changes that affect functionality Looks good. Thanks for the cleanup! ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28344#pullrequestreview-3573461306 From vlivanov at openjdk.org Fri Dec 12 21:15:13 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 12 Dec 2025 21:15:13 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v6] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 21:27:22 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: > > - Improve the test > - Improve comments Thanks for the reviews, Quan, Roland, and Dean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28517#issuecomment-3648152972 From vlivanov at openjdk.org Fri Dec 12 21:15:14 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 12 Dec 2025 21:15:14 GMT Subject: Integrated: 8372634: C2: Materialize type information from instanceof checks In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 00:53:54 GMT, Vladimir Ivanov wrote: > Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. > > There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. > > The difference can be illustrated with the following simple cases: > > class A { void m() {} } > class B extends A { void m() {} } > > void testInstanceOf(A obj) { > if (obj instanceof B) { > obj.m(); > } > } > > InstanceOf::testInstanceOf (12 bytes) > @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call > > vs > > void testInstanceOfCast(A obj) { > if (obj instanceof B) { > B b = (B)obj; > b.m(); > } > } > > InstanceOf::testInstanceOfCast (17 bytes) > @ 13 InstanceOf$B::m (1 bytes) inline (hot) > > > Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. > > FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. > > Testing: hs-tier1 - hs-tier5 This pull request has now been integrated. Changeset: f2e56e4c Author: Vladimir Ivanov URL: https://git.openjdk.org/jdk/commit/f2e56e4c18080616e8ef275a3d9c1da824efda26 Stats: 615 lines in 10 files changed: 581 ins; 6 del; 28 mod 8372634: C2: Materialize type information from instanceof checks Reviewed-by: dlong, qamai, roland ------------- PR: https://git.openjdk.org/jdk/pull/28517 From vlivanov at openjdk.org Fri Dec 12 22:21:53 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 12 Dec 2025 22:21:53 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v6] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Fri, 12 Dec 2025 05:13:11 GMT, Quan Anh Mai wrote: >> Hi, >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. >> >> For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into foldmem > - grammar, safe change > - more detailed explanations > - store values need normalizing > - Just use candidate_set directly > - Some runtime calls may receive a derived pointer but not the base > - Aggressively fold loads from objects that have not escaped Some more thoughts/ideas: So, an object can escape either through a store to memory or as an argument to a call. (Any other scenarios?) If we leave memory graph considerations aside, then traversing control graph from a barrier (call/membar) up to allocation should enumerate all calls and stores in that range. (All stores have control.) (Theoretically, a store control can end up higher in the control graph, but I don't think it happens in practice.) If a call/store has a data dependency on the allocation, then it's an escaping point. One case left is the following: if a store has a control in the region, it can be scheduled after the region unless the store dominates the barrier in the memory graph. But, conservatively, it can also be treated as an escape point interfering with the access being optimized. So, either doing CFG-only or CFG+memory traversal (plus, data inputs traversal on arguments) should detect whether there's an interfering escape point present or not. Do you see any flaws in my reasoning? Speaking of the associated costs, it doesn't look prohibitively expensive. The search is localized and doesn't involve traversal of the whole graph. Alternatively, results of previous analysis requests can be cached. The property changes monotonically: a previously non-escaping case can't turn into escaping one later. If a cache is not invalidated, than the worst case is an optimization opportunity is missed. Speaking of the general approach, if analysis part turns out to be way too expensive for IGVN, I'd still prefer to have the analysis and transformation to be separated and IGVN used to conduct the actual IR changes. There's already some duplication and divergence between IGVN & `PhaseLoadFolding` implementation. Without proper care, it can easily get worse in the future. Another thing to consider: it's beneficial to perform such transformation earlier, as IGVN case illustrates. (For example, by the time EA kicks in, inlining is over.) Shared implementation is easier to maintain and reuse. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3648339738 PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3648340281 From duke at openjdk.org Fri Dec 12 23:01:11 2025 From: duke at openjdk.org (Saint Wesonga) Date: Fri, 12 Dec 2025 23:01:11 GMT Subject: RFR: 8373630: r18_tls should not be modified on Windows AArch64 Message-ID: On Windows, r18_tls is used to store the pointer to the current thread's TEB. Therefore, this register should never be modified (see details in [register_aarch64.hpp](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/register_aarch64.hpp#L118-L128)). One scenario that results in the modification of r18_tls involves virtual threads on Windows. Frames are frozen by [Continuation::try_preempt](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuation.cpp#L131) on one carrier thread whose registers are saved. When the frame is thawed, execution can continue on a different carrier thread. When this happens, [rthread (x28) is fixed to point to the new carrier thread](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L2670). The continuation then results in [restore_live_registers](https://github.com/open jdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp#L313) restoring all the saved registers (including the fixed rthread register). However, this also restores x18, which was the TEB pointer for the previous carrier thread, causing the new carrier thread to execute with the TLS of the previous carrier thread. This causes hangs and occasional crashes in the virtual threads jtreg tests on Windows AArch64 that are resolved by this fix. ------------- Commit messages: - Do not modify r18_tls when restoring registers on Windows AArch64 Changes: https://git.openjdk.org/jdk/pull/28808/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28808&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373630 Stats: 23 lines in 1 file changed: 23 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28808.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28808/head:pull/28808 PR: https://git.openjdk.org/jdk/pull/28808 From duke at openjdk.org Fri Dec 12 23:04:29 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 12 Dec 2025 23:04:29 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache [v3] In-Reply-To: References: Message-ID: > ### Summary > This PR implements [JDK-8326205](https://bugs.openjdk.org/browse/JDK-8326205), introducing experimental support for grouping hot code within the CodeCache. > > ### Description > The feature works by periodically sampling the execution of C2-compiled methods to identify hot code, then relocating those methods into a dedicated `HotCodeHeap` section of the CodeCache. > > Sampling is performed by the `HotCodeSampler`, which runs on a new dedicated `HotCodeGrouper` thread. The thread wakes up every `HotCodeIntervalSeconds` (default 300s) and collects samples for a duration of `HotCodeSampleSeconds` (default 120s). During each sampling period, it iterates over all Java threads, inspects their last Java frame, obtains the current program counter (PC), and maps it to the corresponding nmethod. This allows the sampler to maintain a profile of the most frequently executed methods. > > The `HotCodeGrouper` uses the sampling data to select methods for grouping. Methods are ranked by sample count to form the candidate set. The grouper then relocates these methods (along with their callees, which has been shown to improve performance on AArch64 due to better branch prediction) into the `HotCodeHeap` in descending order of hotness, continuing until the fraction of samples attributable to hot methods exceeds `HotCodeSampleRatio` (default 0.8). The process continues to ensure that the hot-method ratio remains above the threshold. > > The `HotCodeHeap` is a new code heap segment with a default size of 20% of the non-profiled heap, though this can be overridden. This size was chosen based on the principle that roughly 20% of methods contribute to 80% of the work. Only C2-compiled nmethods are eligible for relocation, and the relocation process leverages existing infrastructure from [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). > > Relocation occurs entirely on the grouper thread and runs concurrently with the application. To maintain correctness, the thread acquires the `CodeCache_lock` and `Compile_lock` during relocation but releases these locks between individual relocations to avoid blocking GC safepoints. Removal of nmethods from the `HotCodeHeap` is handled by the GC. > > ### Performance > Testing has shown up to a 20% latency reduction in an internal service with a large CodeCache (512 MB). Public benchmark results are forthcoming. > > ### Testing > * CodeCache tests have been updated to cover the new `HotCodeHeap`. > * Added dedicated tests for the `HotCodeGrouper` > ... Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Add HotCodeGrouperMoveFunction test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27858/files - new: https://git.openjdk.org/jdk/pull/27858/files/ce1c685f..3697718f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27858&range=01-02 Stats: 91 lines in 1 file changed: 91 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/27858.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27858/head:pull/27858 PR: https://git.openjdk.org/jdk/pull/27858 From duke at openjdk.org Fri Dec 12 23:04:31 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 12 Dec 2025 23:04:31 GMT Subject: RFR: 8326205: Grouping frequently called C2 nmethods in CodeCache In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:08:19 GMT, Andrew Haley wrote: > Thanks. > > I need to stress test this code, especially by moving nmethods as much as possible while many threads are executing. Is one of the stress tests here suitable for that? I created `test/hotspot/jtreg/compiler/hotcodegrouper/StressHotCodeGrouper.java`. It generates many nmethods and calls them while the grouper continuously runs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27858#issuecomment-3648439296 From dlong at openjdk.org Fri Dec 12 23:40:51 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 12 Dec 2025 23:40:51 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:31:55 GMT, Roland Westrelin wrote: > The base input of `AddP` is expected to only be set for heap accesses > but I noticed some inconsistencies so I added an assert in the `AddP` > constructor and fixed issues that it caught. AFAFICT, the > inconsistencies shouldn't create issues. src/hotspot/share/opto/macro.cpp line 1211: > 1209: > 1210: Node* PhaseMacroExpand::make_store(Node* ctl, Node* mem, Node* base, int offset, Node* value, BasicType bt) { > 1211: Node* adr = basic_plus_adr(top(), base, offset); Doesn't this cause an assert if make_load or make_store is used with a heap oop? Isn't that a problem for code like PhaseMacroExpand::initialize_object() that calls make_store() with an object? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2615852918 From dlong at openjdk.org Fri Dec 12 23:47:50 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 12 Dec 2025 23:47:50 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:31:55 GMT, Roland Westrelin wrote: > The base input of `AddP` is expected to only be set for heap accesses > but I noticed some inconsistencies so I added an assert in the `AddP` > constructor and fixed issues that it caught. AFAFICT, the > inconsistencies shouldn't create issues. src/hotspot/share/opto/memnode.cpp line 4126: > 4124: Node* base = dest; > 4125: if (phase->type(dest)->isa_oopptr() == nullptr) { > 4126: base = phase->C->top(); How is this possible? Aren't all arrays in the heap? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2615860774 From vlivanov at openjdk.org Sat Dec 13 00:07:51 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 13 Dec 2025 00:07:51 GMT Subject: RFR: 8373577: C2: Cleanup adr_type of CallLeafPureNode In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 11:09:36 GMT, Quan Anh Mai wrote: > Hi, > > This PR is extracted from #28570 , `CallLeafPureNode`s do not read from or write to memory, so their `adr_type` should be `nullptr`. > > Please take a look and leave your reviews, thanks a lot. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28786#pullrequestreview-3573949909 From dlong at openjdk.org Sat Dec 13 00:14:50 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 13 Dec 2025 00:14:50 GMT Subject: RFR: 8373630: r18_tls should not be modified on Windows AArch64 In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 22:54:45 GMT, Saint Wesonga wrote: > On Windows, r18_tls is used to store the pointer to the current thread's TEB. Therefore, this register should never be modified (see details in [register_aarch64.hpp](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/register_aarch64.hpp#L118-L128)). One scenario that results in the modification of r18_tls involves virtual threads on Windows. Frames are frozen by [Continuation::try_preempt](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuation.cpp#L131) on one carrier thread whose registers are saved. When the frame is thawed, execution can continue on a different carrier thread. When this happens, [rthread (x28) is fixed to point to the new carrier thread](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L2670). The continuation then results in [restore_live_registers](https://github.com/op enjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp#L313) restoring all the saved registers (including the fixed rthread register). However, this also restores x18, which was the TEB pointer for the previous carrier thread, causing the new carrier thread to execute with the TLS of the previous carrier thread. This causes hangs and occasional crashes in the virtual threads jtreg tests on Windows AArch64 that are resolved by this fix. Shouldn't the #ifdef be using R18_RESERVED? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28808#issuecomment-3648570142 From dlong at openjdk.org Sat Dec 13 00:29:53 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 13 Dec 2025 00:29:53 GMT Subject: RFR: 8373630: r18_tls should not be modified on Windows AArch64 In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 22:54:45 GMT, Saint Wesonga wrote: > On Windows, r18_tls is used to store the pointer to the current thread's TEB. Therefore, this register should never be modified (see details in [register_aarch64.hpp](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/register_aarch64.hpp#L118-L128)). One scenario that results in the modification of r18_tls involves virtual threads on Windows. Frames are frozen by [Continuation::try_preempt](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuation.cpp#L131) on one carrier thread whose registers are saved. When the frame is thawed, execution can continue on a different carrier thread. When this happens, [rthread (x28) is fixed to point to the new carrier thread](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L2670). The continuation then results in [restore_live_registers](https://github.com/op enjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp#L313) restoring all the saved registers (including the fixed rthread register). However, this also restores x18, which was the TEB pointer for the previous carrier thread, causing the new carrier thread to execute with the TLS of the previous carrier thread. This causes hangs and occasional crashes in the virtual threads jtreg tests on Windows AArch64 that are resolved by this fix. Nice find. It would be really useful to have a test case that reproduces the problem, and also some idea of how likely it is. I bumped the bug to P1 for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28808#issuecomment-3648593901 From missa at openjdk.org Sat Dec 13 00:36:33 2025 From: missa at openjdk.org (Mohamed Issa) Date: Sat, 13 Dec 2025 00:36:33 GMT Subject: RFR: 8368977: Provide clear naming for AVX10 identifiers [v3] In-Reply-To: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> References: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> Message-ID: > This is a simple change that renames all AVX10 identifiers to explicitly show which sub-versions are in use. Right now, AVX10.2 is the only case to worry about. The JTREG tests listed below were used to verify correctness with the recommended JVM options mentioned in corresponding source files. Each test included runs through emulation with AVX10.2 enabled and disabled to exercise all possible paths. All modifications and tests used [OpenJDK v26-b24](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B24) as the baseline build. > > 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` > 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` > 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` > 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` > 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` > 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` > 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` > 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` > 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java` > 10. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java` > 11. `jtreg:test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java` > 12. `jtreg:test/jdk/jdk/incubator/vector/Double64VectorTests.java` > 13. `jtreg:test/jdk/jdk/incubator/vector/Double128VectorTests.java` > 14. `jtreg:test/jdk/jdk/incubator/vector/Double256VectorTests.java` > 15. `jtreg:test/jdk/jdk/incubator/vector/Double512VectorTests.java` > 16. `jtreg:test/jdk/jdk/incubator/vector/DoubleMaxVectorTests.java` > 17. `jtreg:test/jdk/jdk/incubator/vector/Float64VectorTests.java` > 18. `jtreg:test/jdk/jdk/incubator/vector/Float128VectorTests.java` > 19. `jtreg:test/jdk/jdk/incubator/vector/Float256VectorTests.java` > 20. `jtreg:test/jdk/jdk/incubator/vector/Float512VectorTests.java` > 21. `jtreg:test/jdk/jdk/incubator/vector/FloatMaxVectorTests.java` > 22. `jtreg:test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` > 23. `jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java` Mohamed Issa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into user/missa-prime/nomenclature - Remove changes that affect functionality - Fix naming issue in vector floating point cast test file - Rename AVX10 identifiers to AVX10_2 and use AVX10.2 in template table conversion whenever available ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28344/files - new: https://git.openjdk.org/jdk/pull/28344/files/2a029dab..be7f5c80 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28344&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28344&range=01-02 Stats: 134206 lines in 2054 files changed: 88096 ins; 33081 del; 13029 mod Patch: https://git.openjdk.org/jdk/pull/28344.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28344/head:pull/28344 PR: https://git.openjdk.org/jdk/pull/28344 From vlivanov at openjdk.org Sat Dec 13 01:00:52 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 13 Dec 2025 01:00:52 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v3] In-Reply-To: <0b81mH1_Y6r905N2HmehXBbSFdzLpJIfuXHNfijpHBs=.c870b13e-a52f-4c00-b771-91cf9205cb4a@github.com> References: <6ip4JrJ4WiYEe6d2FA_WQ5dDjxAk2RPaPbwth4jNeJM=.43d7879d-89a4-434c-80ea-371c92581686@github.com> <0b81mH1_Y6r905N2HmehXBbSFdzLpJIfuXHNfijpHBs=.c870b13e-a52f-4c00-b771-91cf9205cb4a@github.com> Message-ID: On Thu, 11 Dec 2025 07:19:24 GMT, Emanuel Peter wrote: >> It's a test on C2 IR. What's the point in running it w/o C2? > > You can always do more than just C2 IR verification. For example, we could also do result verification. That would give us coverage for C1 for example. I think it is just good practice not to have a restriction if it is not absolutely necessary. I don't argue that there's always a chance to catch a bug, but unit tests on C2 IR are mostly trivial, so the actual chance to spot a unique problem is quite low. And the price is execution time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2615925589 From vlivanov at openjdk.org Sat Dec 13 01:23:56 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 13 Dec 2025 01:23:56 GMT Subject: RFR: 8372136: VectorAPI: Refactor subword gather load API java implementation In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 01:42:07 GMT, Xiaohong Gong wrote: > The current subword (`byte`/`short`) gather load API implementation is not well-suited for platforms that provide native vector instructions for these operations. As **discussed in PR [1]**, we'd like to re-implement these APIs with a **unified cross-platform** solution. > > The main idea is to re-implement the API at Java-level, by performing multiple sub-gather operations. Each sub-gather operation loads a portion of elements using a specific index vector by calling the HotSpot intrinsic API. The partial results are then merged using vector `slice` and `or` operations. This design simplifies the VM compiler intrinsic implementation and better aligns with the Vector API design principles. > > Key changes: > 1. Re-implement the subword gather load API at the Java level. The HotSpot intrinsic `VectorSupport.loadWithMap` is simplified by reducing the vector index parameters from four (vix1-vix4) to a single parameter. > 2. Adjust the compiler intrinsic implementation to support the new Java API, including updates to the x86 backend implementation. > > The performance impact varies across different scenarios on X86. I tested the performance with different AVX levels on an X86 machine that supports AVX512. To achieve optimal performance, I also **applied PR [2]**, which improves the performance of the **`slice()`** API on X86. Following is the summarized performance gains, where: > > - "non masked" means the gather operation is not the masked gather API. > - "masked" means the gather operation is the masked gather API. > - "1 gather cases" means the gather API is implemented with a single gather operation. E.g. Load `Short128Vector` with `MaxVectorSize=256`. > - "2 gather cases" means the gather API is implemented with 2 parts of gather operations. E.g. Load `Short256Vector` with `MaxVectorSize=256`. > - "4 gather cases" means the gather API is implemented with 4 parts of gather operations. E.g. Load `Byte256Vector` with `MaxVectorSize=256`. > - "Un-intrinsified" means the gather operation is not supported to be intrinsified by hotspot. E.g. Load `Byte512Vector` with `MaxVectorSize=256`. The singificant performance uplifts comes from the Java-level changes which removes the vector index generation and range checks for such cases. > > > ---------------------------------------------------------------------------- > | UseAVX=3 | UseAVX=2 | > |-----------------------------|-----------------------------| > | non maske... Good work, Xiaohong! Can you, please, include samples of machine code generated before/after the patch (for AVX2 and AVX512)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28520#issuecomment-3648697712 From jrose at openjdk.org Sat Dec 13 02:13:52 2025 From: jrose at openjdk.org (John R Rose) Date: Sat, 13 Dec 2025 02:13:52 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v3] In-Reply-To: References: Message-ID: <_j4OsghE4w1f2L8K6QZOEGozC9-HrFDq1Mhn871B2_Q=.a32b5068-6fac-491a-8861-0d17526dbada@github.com> On Tue, 2 Dec 2025 23:25:29 GMT, Chen Liang wrote: >> Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Typo src/hotspot/share/opto/library_call.cpp line 4786: > 4784: if (t != nullptr && t->const_oop() != nullptr) { > 4785: jint hash = t->const_oop()->identity_hash_or_no_hash(); > 4786: if (hash != 0) { Don?t compare against zero, please, but against no_hash. (Any other stray zeroes?) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2615991597 From vlivanov at openjdk.org Sat Dec 13 02:32:23 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 13 Dec 2025 02:32:23 GMT Subject: RFR: 8373633: C2: Use interface receiver type to improve CHA decisions Message-ID: Strength-reducing an interface call to a virtual call for interfaces with unique implementors can use receiver type information to narrow the context. C2 tracks interface types and receiver type information can be used to reveal an interface with a unique implementor which can't be derived from the call site itself. Since C2 effectively accumulates a union interface type from multiple subtype checks, iterating over individual components of a type may reveal a candidate for a strength-reduction. The only prerequisite is that a candidate has to be a subtype of the declared interface. Testing: hs-tier1 - hs-tier5 ------------- Commit messages: - Use receiver type to improve CHA decisions Changes: https://git.openjdk.org/jdk/pull/28811/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28811&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373633 Stats: 255 lines in 7 files changed: 157 ins; 51 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/28811.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28811/head:pull/28811 PR: https://git.openjdk.org/jdk/pull/28811 From vlivanov at openjdk.org Sat Dec 13 02:35:58 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 13 Dec 2025 02:35:58 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: <03LS79mJXIs41UMm3gio_urrsHSZP7dPw81NEfcag88=.c6542fe1-f94a-4e80-8393-7cd235136e34@github.com> On Tue, 2 Dec 2025 23:08:44 GMT, Chen Liang wrote: >> Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. > > I tried to come up with an example where the buggy code from Vladimir would inline to identityHashCode when the right call would be virtual - couldn't construct such a case unfortunately :( > > I think we can deal with IGVN later, as this involves creating new macro node and other infrastructure support. @liach please, incorporate latest version from https://github.com/openjdk/jdk/compare/master...iwanowww:jdk:c2.identity_hash ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3648776804 From qamai at openjdk.org Sat Dec 13 02:53:57 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 13 Dec 2025 02:53:57 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v6] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: <9WPS9Bs_4Kd4haCl9Z7RZahShBlW1mPyjrQG6PgJoWs=.02fa3665-e1ae-4f8f-b09d-4a367eddd14d@github.com> On Fri, 12 Dec 2025 22:19:46 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into foldmem >> - grammar, safe change >> - more detailed explanations >> - store values need normalizing >> - Just use candidate_set directly >> - Some runtime calls may receive a derived pointer but not the base >> - Aggressively fold loads from objects that have not escaped > > Speaking of the general approach, if analysis part turns out to be way too > expensive for IGVN, I'd still prefer to have the analysis and transformation to be > separated and IGVN used to conduct the actual IR changes. > > There's already some duplication and divergence between IGVN & `PhaseLoadFolding` > implementation. Without proper care, it can easily get worse in the future. > > Another thing to consider: it's beneficial to perform such transformation > earlier, as IGVN case illustrates. (For example, by the time EA kicks in, > inlining is over.) Shared implementation is easier to maintain and reuse. @iwanowww Thanks for your analysis, I think it is possible to do the transformation during IGVN and have created another PR which follows that approach, could you take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3648796355 From qamai at openjdk.org Sat Dec 13 02:56:25 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 13 Dec 2025 02:56:25 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped Message-ID: Hi, This patch is an alternative to #28764 but it does the analysis during IGVN instead. Please take a look and leave your thoughts, thanks a lot. ------------- Commit messages: - Aggressively fold loads from objects that have not escaped Changes: https://git.openjdk.org/jdk/pull/28812/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373495 Stats: 453 lines in 4 files changed: 414 ins; 8 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From duke at openjdk.org Sat Dec 13 03:11:56 2025 From: duke at openjdk.org (duke) Date: Sat, 13 Dec 2025 03:11:56 GMT Subject: RFR: 8368977: Provide clear naming for AVX10 identifiers [v3] In-Reply-To: References: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> Message-ID: <9JSV7fdlASzIbFgoR_-pTOkMRrrCNu5rooSZ1nTaGhs=.d8d5fa21-342e-4d57-bc33-330369943d9a@github.com> On Sat, 13 Dec 2025 00:36:33 GMT, Mohamed Issa wrote: >> This is a simple change that renames all AVX10 identifiers to explicitly show which sub-versions are in use. Right now, AVX10.2 is the only case to worry about. The JTREG tests listed below were used to verify correctness with the recommended JVM options mentioned in corresponding source files. Each test included runs through emulation with AVX10.2 enabled and disabled to exercise all possible paths. All modifications and tests used [OpenJDK v26-b24](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B24) as the baseline build. >> >> 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` >> 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` >> 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` >> 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` >> 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` >> 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` >> 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` >> 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` >> 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java` >> 10. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java` >> 11. `jtreg:test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java` >> 12. `jtreg:test/jdk/jdk/incubator/vector/Double64VectorTests.java` >> 13. `jtreg:test/jdk/jdk/incubator/vector/Double128VectorTests.java` >> 14. `jtreg:test/jdk/jdk/incubator/vector/Double256VectorTests.java` >> 15. `jtreg:test/jdk/jdk/incubator/vector/Double512VectorTests.java` >> 16. `jtreg:test/jdk/jdk/incubator/vector/DoubleMaxVectorTests.java` >> 17. `jtreg:test/jdk/jdk/incubator/vector/Float64VectorTests.java` >> 18. `jtreg:test/jdk/jdk/incubator/vector/Float128VectorTests.java` >> 19. `jtreg:test/jdk/jdk/incubator/vector/Float256VectorTests.java` >> 20. `jtreg:test/jdk/jdk/incubator/vector/Float512VectorTests.java` >> 21. `jtreg:test/jdk/jdk/incubator/vector/FloatMaxVectorTests.java` >> 22. `jtreg:test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` >> 23. `jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java` > > Mohamed Issa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into user/missa-prime/nomenclature > - Remove changes that affect functionality > - Fix naming issue in vector floating point cast test file > - Rename AVX10 identifiers to AVX10_2 and use AVX10.2 in template table conversion whenever available @missa-prime Your change (at version be7f5c80ebeddd809248f3477051e98eda2b81f0) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28344#issuecomment-3648826101 From missa at openjdk.org Sat Dec 13 03:20:00 2025 From: missa at openjdk.org (Mohamed Issa) Date: Sat, 13 Dec 2025 03:20:00 GMT Subject: Integrated: 8368977: Provide clear naming for AVX10 identifiers In-Reply-To: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> References: <6XYgqaHA0PPZzvnfysKOP5XGP7e_RMkVFt9PV2OT3Gk=.e5f33072-a91a-4e57-99f3-81cc4ae4d844@github.com> Message-ID: On Mon, 17 Nov 2025 03:46:50 GMT, Mohamed Issa wrote: > This is a simple change that renames all AVX10 identifiers to explicitly show which sub-versions are in use. Right now, AVX10.2 is the only case to worry about. The JTREG tests listed below were used to verify correctness with the recommended JVM options mentioned in corresponding source files. Each test included runs through emulation with AVX10.2 enabled and disabled to exercise all possible paths. All modifications and tests used [OpenJDK v26-b24](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B24) as the baseline build. > > 1. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java` > 2. `jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java` > 3. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java` > 4. `jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java` > 5. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java` > 6. `jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java` > 7. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java` > 8. `jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java` > 9. `jtreg:test/hotspot/jtreg/compiler/floatingpoint/ScalarFPtoIntCastTest.java` > 10. `jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java` > 11. `jtreg:test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java` > 12. `jtreg:test/jdk/jdk/incubator/vector/Double64VectorTests.java` > 13. `jtreg:test/jdk/jdk/incubator/vector/Double128VectorTests.java` > 14. `jtreg:test/jdk/jdk/incubator/vector/Double256VectorTests.java` > 15. `jtreg:test/jdk/jdk/incubator/vector/Double512VectorTests.java` > 16. `jtreg:test/jdk/jdk/incubator/vector/DoubleMaxVectorTests.java` > 17. `jtreg:test/jdk/jdk/incubator/vector/Float64VectorTests.java` > 18. `jtreg:test/jdk/jdk/incubator/vector/Float128VectorTests.java` > 19. `jtreg:test/jdk/jdk/incubator/vector/Float256VectorTests.java` > 20. `jtreg:test/jdk/jdk/incubator/vector/Float512VectorTests.java` > 21. `jtreg:test/jdk/jdk/incubator/vector/FloatMaxVectorTests.java` > 22. `jtreg:test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` > 23. `jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java` This pull request has now been integrated. Changeset: 4f1dcf89 Author: Mohamed Issa Committer: Vladimir Ivanov URL: https://git.openjdk.org/jdk/commit/4f1dcf89b841e9a37d342bdf8c66bbbab9edb0d4 Stats: 99 lines in 9 files changed: 0 ins; 0 del; 99 mod 8368977: Provide clear naming for AVX10 identifiers Reviewed-by: jbhateja, mhaessig, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/28344 From vlivanov at openjdk.org Sat Dec 13 03:53:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 13 Dec 2025 03:53:49 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 02:49:07 GMT, Quan Anh Mai wrote: > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > Please take a look and leave your thoughts, thanks a lot. Very nice! I definitely prefer the approach here to #28764. I see that the unit test stays the same and there's an adjustment in some other test, so I assume this version is functionally more powerful than #28764 version. Have you had a chance to measure how much it affects compilation speed compared to #28764? (The code is dense and hard to reason about, so some polishing/refactoring to make it more readable. Also, please, think about verification checks.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3648880857 From vlivanov at openjdk.org Sat Dec 13 03:57:53 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 13 Dec 2025 03:57:53 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v6] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Fri, 12 Dec 2025 05:13:11 GMT, Quan Anh Mai wrote: >> Hi, >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. >> >> For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into foldmem > - grammar, safe change > - more detailed explanations > - store values need normalizing > - Just use candidate_set directly > - Some runtime calls may receive a derived pointer but not the base > - Aggressively fold loads from objects that have not escaped src/hotspot/share/opto/phaseloadfolding.cpp line 349: > 347: assert(store->Opcode() == candidate->store_Opcode(), "must match %s - %s", store->Name(), candidate->Name()); > 348: Node* res = store->in(MemNode::ValueIn); > 349: if (candidate->Opcode() == Op_LoadUB) { Is such adaptation needed? `MemNode::can_see_stored_value()` solves a similar task, but it doesn't perform any adaptation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28764#discussion_r2616044460 From duke at openjdk.org Sat Dec 13 05:09:33 2025 From: duke at openjdk.org (Saint Wesonga) Date: Sat, 13 Dec 2025 05:09:33 GMT Subject: RFR: 8373630: r18_tls should not be modified on Windows AArch64 [v2] In-Reply-To: References: Message-ID: > On Windows, r18_tls is used to store the pointer to the current thread's TEB. Therefore, this register should never be modified (see details in [register_aarch64.hpp](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/register_aarch64.hpp#L118-L128)). One scenario that results in the modification of r18_tls involves virtual threads on Windows. Frames are frozen by [Continuation::try_preempt](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuation.cpp#L131) on one carrier thread whose registers are saved. When the frame is thawed, execution can continue on a different carrier thread. When this happens, [rthread (x28) is fixed to point to the new carrier thread](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L2670). The continuation then results in [restore_live_registers](https://github.com/op enjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp#L313) restoring all the saved registers (including the fixed rthread register). However, this also restores x18, which was the TEB pointer for the previous carrier thread, causing the new carrier thread to execute with the TLS of the previous carrier thread. This causes hangs and occasional crashes in the virtual threads jtreg tests on Windows AArch64 that are resolved by this fix. Saint Wesonga has updated the pull request incrementally with one additional commit since the last revision: Do not modify r18_tls if R18_RESERVED is defined ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28808/files - new: https://git.openjdk.org/jdk/pull/28808/files/2cdee78c..e5a9ef0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28808&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28808&range=00-01 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/28808.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28808/head:pull/28808 PR: https://git.openjdk.org/jdk/pull/28808 From duke at openjdk.org Sat Dec 13 05:09:33 2025 From: duke at openjdk.org (Saint Wesonga) Date: Sat, 13 Dec 2025 05:09:33 GMT Subject: RFR: 8373630: r18_tls should not be modified on Windows AArch64 In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 00:11:56 GMT, Dean Long wrote: > Shouldn't the #ifdef be using R18_RESERVED? Yes, I have changed the condition to R18_RESERVED ------------- PR Comment: https://git.openjdk.org/jdk/pull/28808#issuecomment-3648935551 From duke at openjdk.org Sat Dec 13 05:17:51 2025 From: duke at openjdk.org (Saint Wesonga) Date: Sat, 13 Dec 2025 05:17:51 GMT Subject: RFR: 8373630: r18_tls should not be modified on Windows AArch64 In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 00:26:47 GMT, Dean Long wrote: > Nice find. It would be really useful to have a test case that reproduces the problem, and also some idea of how likely it is. I bumped the bug to P1 for now. The virtual threads MonitorEnterExit test has a 100% failure repro rate on Windows AArch64 without this change (but it does not fail on macosx-aarch64 without this change, even though x18 is also reserved on macosx-aarch64). I was specifically running the [testMutualExclusion](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/test/jdk/java/lang/Thread/virtual/MonitorEnterExit.java#L392-L425) parametized test with 0 platform threads and at least 2 virtual threads when investigating this behavior. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28808#issuecomment-3648942275 From qamai at openjdk.org Sat Dec 13 07:17:12 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 13 Dec 2025 07:17:12 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: More conservative and exhaustive analysis, better explanations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/c70bacac..d8082201 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=00-01 Stats: 153 lines in 1 file changed: 90 ins; 20 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From qamai at openjdk.org Sat Dec 13 07:25:58 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 13 Dec 2025 07:25:58 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v6] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Sat, 13 Dec 2025 03:54:47 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into foldmem >> - grammar, safe change >> - more detailed explanations >> - store values need normalizing >> - Just use candidate_set directly >> - Some runtime calls may receive a derived pointer but not the base >> - Aggressively fold loads from objects that have not escaped > > src/hotspot/share/opto/phaseloadfolding.cpp line 349: > >> 347: assert(store->Opcode() == candidate->store_Opcode(), "must match %s - %s", store->Name(), candidate->Name()); >> 348: Node* res = store->in(MemNode::ValueIn); >> 349: if (candidate->Opcode() == Op_LoadUB) { > > Is such adaptation needed? `MemNode::can_see_stored_value()` solves a similar task, but it doesn't perform any adaptation. Yes, it only looks for a matching store, the one doing the normalization is `Load[B|US|S|US]Node::Ideal`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28764#discussion_r2616146883 From qamai at openjdk.org Sat Dec 13 07:29:46 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 13 Dec 2025 07:29:46 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v3] In-Reply-To: References: Message-ID: > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: missing comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/d8082201..0cfc9aee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From aph at openjdk.org Sat Dec 13 09:40:53 2025 From: aph at openjdk.org (Andrew Haley) Date: Sat, 13 Dec 2025 09:40:53 GMT Subject: RFR: 8373630: r18_tls should not be modified on Windows AArch64 [v2] In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 05:09:33 GMT, Saint Wesonga wrote: >> On Windows, r18_tls is used to store the pointer to the current thread's TEB. Therefore, this register should never be modified (see details in [register_aarch64.hpp](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/register_aarch64.hpp#L118-L128)). One scenario that results in the modification of r18_tls involves virtual threads on Windows. Frames are frozen by [Continuation::try_preempt](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuation.cpp#L131) on one carrier thread whose registers are saved. When the frame is thawed, execution can continue on a different carrier thread. When this happens, [rthread (x28) is fixed to point to the new carrier thread](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L2670). The continuation then results in [restore_live_registers](https://github.com/o penjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp#L313) restoring all the saved registers (including the fixed rthread register). However, this also restores x18, which was the TEB pointer for the previous carrier thread, causing the new carrier thread to execute with the TLS of the previous carrier thread. This causes hangs and occasional crashes in the virtual threads jtreg tests on Windows AArch64 that are resolved by this fix. > > Saint Wesonga has updated the pull request incrementally with one additional commit since the last revision: > > Do not modify r18_tls if R18_RESERVED is defined I think this is OK for a quick fix for the upcoming release, but in future save/restore should be fixed so that they exclude `r18_tls`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28808#issuecomment-3649156476 From qamai at openjdk.org Sat Dec 13 10:12:16 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 13 Dec 2025 10:12:16 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v4] In-Reply-To: References: Message-ID: <7_G-jrynu7OQVHmM_7MWpmBlVu1t3Ovmavff6PUf8js=.f128a2ec-3e99-4541-a261-729bd6c6ada6@github.com> > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Take into consideration dead paths ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/0cfc9aee..b6b32663 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=02-03 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From qamai at openjdk.org Sat Dec 13 10:44:28 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 13 Dec 2025 10:44:28 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v5] In-Reply-To: References: Message-ID: > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix TestZGCBarrierElision ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/b6b32663..51d7741c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=03-04 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From qamai at openjdk.org Sat Dec 13 10:44:28 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 13 Dec 2025 10:44:28 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 03:51:32 GMT, Vladimir Ivanov wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> Please take a look and leave your thoughts, thanks a lot. > > Very nice! I definitely prefer the approach here to #28764. > > I see that the unit test stays the same and there's an adjustment in some other test, so I assume this version is functionally more powerful than #28764 version. > > Have you had a chance to measure how much it affects compilation speed compared to #28764? > > (The code is dense and hard to reason about, so some polishing/refactoring to make it more readable. Also, please, think about verification checks.) @iwanowww Thanks for your comment. I have added a lot more comments to explain in detail the steps of `MemNode::find_previous_store`. I have also made a small modification: instead of traversing the outputs of the control nodes from the call to the allocation, we traverse the outputs of the nodes that may alias `base` instead. This has some benefits: - It is likely cheaper. This is because there are often few nodes that may alias `base`, while there may be numerous control nodes from the call to the allocation. The number of nodes that directly use a pointer is also less than the number of nodes that directly use a random control node. - It is more conservative. This is because we can limit the type of the outputs of a pointer and be conservative with everything else, while exhaustively checking if a random use of a random control node makes `base` escape seems hard. I have also added some verification that if a step determines that `base` does not escape, then the following steps must not determine otherwise. For the runtime cost, I don't see a noticeable difference compared to master. For the unit test, compared to the previous PR, I have removed the `failOn = LoadI` from the tests that involve loops. But I think improving load folding on `Phi` can be another PR. For the change in `TestZGCEffectiveBarrierElision`, it is because I decided to add `Blackhole` to the list of nodes that do not escape an object, not sure if it is necessary, though. However, I managed to change the test so the load is not elided. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3649201130 From qamai at openjdk.org Sat Dec 13 14:10:00 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 13 Dec 2025 14:10:00 GMT Subject: RFR: 8373577: C2: Cleanup adr_type of CallLeafPureNode In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 11:09:36 GMT, Quan Anh Mai wrote: > Hi, > > This PR is extracted from #28570 , `CallLeafPureNode`s do not read from or write to memory, so their `adr_type` should be `nullptr`. > > Please take a look and leave your reviews, thanks a lot. Thanks a lot for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28786#issuecomment-3649464950 From qamai at openjdk.org Sat Dec 13 14:10:00 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 13 Dec 2025 14:10:00 GMT Subject: Integrated: 8373577: C2: Cleanup adr_type of CallLeafPureNode In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 11:09:36 GMT, Quan Anh Mai wrote: > Hi, > > This PR is extracted from #28570 , `CallLeafPureNode`s do not read from or write to memory, so their `adr_type` should be `nullptr`. > > Please take a look and leave your reviews, thanks a lot. This pull request has now been integrated. Changeset: 104d0cb5 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/104d0cb542d12f133ac8a0a34f2b21ca3aa4a5cc Stats: 8 lines in 4 files changed: 1 ins; 2 del; 5 mod 8373577: C2: Cleanup adr_type of CallLeafPureNode Reviewed-by: roland, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/28786 From qamai at openjdk.org Sat Dec 13 14:58:29 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 13 Dec 2025 14:58:29 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v6] In-Reply-To: References: Message-ID: > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. > > I do not see a noticeable difference in C2 runtime with and without this patch. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: be even more rigorous ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/51d7741c..918d7fe7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=04-05 Stats: 20 lines in 1 file changed: 9 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From qamai at openjdk.org Sat Dec 13 15:10:37 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 13 Dec 2025 15:10:37 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v7] In-Reply-To: References: Message-ID: > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. > > I do not see a noticeable difference in C2 runtime with and without this patch. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: safepoints do not have a memory output ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/918d7fe7..ce3ca6ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=05-06 Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From qamai at openjdk.org Sat Dec 13 15:12:51 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 13 Dec 2025 15:12:51 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v6] In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 14:58:29 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. >> >> I do not see a noticeable difference in C2 runtime with and without this patch. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > be even more rigorous I have made further changes that I believe have made the change pretty rigorous, I don't think I can see any flaw in the reasoning that allows mis-analysis now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3649525802 From qamai at openjdk.org Sat Dec 13 16:23:26 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 13 Dec 2025 16:23:26 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v8] In-Reply-To: References: Message-ID: > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. > > I do not see a noticeable difference in C2 runtime with and without this patch. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: consistently use phase->value during IGVN ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/ce3ca6ae..622ad5a7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=06-07 Stats: 14 lines in 1 file changed: 0 ins; 5 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From qamai at openjdk.org Sun Dec 14 00:08:06 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 14 Dec 2025 00:08:06 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v6] In-Reply-To: References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Fri, 12 Dec 2025 05:13:11 GMT, Quan Anh Mai wrote: >> Hi, >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. >> >> For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into foldmem > - grammar, safe change > - more detailed explanations > - store values need normalizing > - Just use candidate_set directly > - Some runtime calls may receive a derived pointer but not the base > - Aggressively fold loads from objects that have not escaped Close in favour of #28812 ------------- PR Comment: https://git.openjdk.org/jdk/pull/28764#issuecomment-3649952432 From qamai at openjdk.org Sun Dec 14 00:08:07 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 14 Dec 2025 00:08:07 GMT Subject: Withdrawn: 8373495: C2: Aggressively fold loads from objects that have not escaped In-Reply-To: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> References: <3V318dfkluXRxnbshRHM7V5njmHw_Tvd00rXGFi3N58=.1379b184-27c1-467a-a75b-7896502e758f@github.com> Message-ID: On Thu, 11 Dec 2025 09:10:30 GMT, Quan Anh Mai wrote: > Hi, > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. > > For the runtime cost, this phase runs very fast, around 5 - 7% the runtime of EA, and about 0.5% the total runtime of C2. > > Please take a look and leave your thoughts, thanks a lot. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/28764 From fjiang at openjdk.org Sun Dec 14 04:05:54 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Sun, 14 Dec 2025 04:05:54 GMT Subject: RFR: 8373069: RISC-V: implement GHASH intrinsic [v4] In-Reply-To: <8MCvHCHkscmoOkB_cKGP5mkhHWBw6B3PfalaBL4aVg0=.0a6e3bc9-7b6b-498d-81fb-1a276adc2a31@github.com> References: <8MCvHCHkscmoOkB_cKGP5mkhHWBw6B3PfalaBL4aVg0=.0a6e3bc9-7b6b-498d-81fb-1a276adc2a31@github.com> Message-ID: On Thu, 11 Dec 2025 12:22:13 GMT, Anjian Wen wrote: >> support GHASH intrinsic for crypt GCM, which need zvkg extension. >> >> passed the tests in >> test/hotspot/jtreg/compiler/codegen/aes/ >> test/jdk/com/sun/crypto > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify format src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3019: > 3017: assert(UseGHASHIntrinsics, "Must be"); > 3018: assert(UseZvbb, "need Zvbb extension support"); > 3019: assert(UseZvkg, "need GHASH instructions (Zvkg extension) support"); Do we need `UseZvbb` and `UseZvkg` assertions here? `UseGHASHIntrinsics` should be enough as it depends on `UseZvbb` and `UseZvkg`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28548#discussion_r2616696345 From qamai at openjdk.org Sun Dec 14 12:25:01 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 14 Dec 2025 12:25:01 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v9] In-Reply-To: References: Message-ID: > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. > > I do not see a noticeable difference in C2 runtime with and without this patch. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Cheaper and stronger assert, add test for devirtualization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/622ad5a7..31d96537 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=07-08 Stats: 194 lines in 3 files changed: 86 ins; 64 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From fyang at openjdk.org Mon Dec 15 01:44:57 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 15 Dec 2025 01:44:57 GMT Subject: RFR: 8373069: RISC-V: implement GHASH intrinsic [v4] In-Reply-To: <8MCvHCHkscmoOkB_cKGP5mkhHWBw6B3PfalaBL4aVg0=.0a6e3bc9-7b6b-498d-81fb-1a276adc2a31@github.com> References: <8MCvHCHkscmoOkB_cKGP5mkhHWBw6B3PfalaBL4aVg0=.0a6e3bc9-7b6b-498d-81fb-1a276adc2a31@github.com> Message-ID: On Thu, 11 Dec 2025 12:22:13 GMT, Anjian Wen wrote: >> support GHASH intrinsic for crypt GCM, which need zvkg extension. >> >> passed the tests in >> test/hotspot/jtreg/compiler/codegen/aes/ >> test/jdk/com/sun/crypto > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify format src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3021: > 3019: assert(UseZvkg, "need GHASH instructions (Zvkg extension) support"); > 3020: > 3021: __ align(CodeEntryAlignment); Can you move this line to immediately before L3025? Like: __ align(CodeEntryAlignment); address start = __ pc(); __ enter(); Then it looks more obvious where we want to align the code. BTW: Seems CBC and CTR intrinsics need similar adjustment. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3052: > 3050: __ vghsh_vv(partial_hash, hash_subkey, cipher_text); > 3051: __ subi(blocks, blocks, 1); > 3052: __ bnez(blocks, L_ghash_loop); Please leave a new line after the loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28548#discussion_r2617702776 PR Review Comment: https://git.openjdk.org/jdk/pull/28548#discussion_r2617699025 From erfang at openjdk.org Mon Dec 15 02:39:52 2025 From: erfang at openjdk.org (Eric Fang) Date: Mon, 15 Dec 2025 02:39:52 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v5] In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 07:47:12 GMT, Eric Fang wrote: >> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. >> >> This PR does the following optimizations: >> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. >> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect... > > Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java > - Refine the test code and comments > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Don't read and write the same memory in the JMH benchmarks > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns > > `VectorMaskCastNode` is used to cast a vector mask from one type to > another type. The cast may be generated by calling the vector API `cast` > or generated by the compiler. For example, some vector mask operations > like `trueCount` require the input mask to be integer types, so for > floating point type masks, the compiler will cast the mask to the > corresponding integer type mask automatically before doing the mask > operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` > don't generate code, otherwise code will be generated to extend or narrow > the mask. This IR node is not free no matter it generates code or not > because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` > The middle `VectorMaskCast` prevented the following optimization: > `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which > blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we > can safely do the optimization. But if the input value is changed, we > can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper > function, which can be used to uncast a chain of `VectorMaskCastNode`, > like the existing `Node::uncast(bool)` function. The funtion returns > the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may > contain one or more consecutive `VectorMaskCastNode` and this does not > affect the correctness of the optimization. Then this function can be > called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(V... Thanks for your review ! @eme64 ------------- PR Review: https://git.openjdk.org/jdk/pull/28313#pullrequestreview-3576179654 From erfang at openjdk.org Mon Dec 15 02:39:54 2025 From: erfang at openjdk.org (Eric Fang) Date: Mon, 15 Dec 2025 02:39:54 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v2] In-Reply-To: References: <4vSKAtr0tUG0V193gIvnEFdHm18ZhqflVAwk-09IVQ0=.081806f5-6303-4b4f-975d-7c85427ccae5@github.com> Message-ID: On Fri, 12 Dec 2025 15:10:29 GMT, Emanuel Peter wrote: >> Yeah it's already a static method. See https://github.com/openjdk/jdk/pull/28313/files#diff-ba9e2d10a50a01316946660ec9f68321eb864fd9c815616c10abbec39360efe5R141 >> >> Or you mean a static method limited to this file ? If so, I prefer not, it may be used at other places. Thanks~ > > Could you return a `VectorNode*`? And should the input not already be a `VectorNode*`? Hi @eme64 thanks for your review. Here the design is as follows: 1. Any node can be used as the input to this function. 2. If the input node `n` is not a `VectorMaskCastNode`, then return `n` itself. The advantage of this is that we don't need to check whether the parameter `n` is a `VectorNode` when calling `uncast_mask`. This check is necessary, for example, `n` might be a `PhiNode`, which is not a `VectorNode`. Actually, this function could also be placed in the `Node` class, just like the `Node::uncast` function. However, I feel that this function is primarily used for vector nodes, and to avoid unnecessarily expanding its scope, I placed it in `VectorNode`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2617757865 From erfang at openjdk.org Mon Dec 15 02:39:58 2025 From: erfang at openjdk.org (Eric Fang) Date: Mon, 15 Dec 2025 02:39:58 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v5] In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 15:12:37 GMT, Emanuel Peter wrote: >> Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java >> - Refine the test code and comments >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - Don't read and write the same memory in the JMH benchmarks >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns >> >> `VectorMaskCastNode` is used to cast a vector mask from one type to >> another type. The cast may be generated by calling the vector API `cast` >> or generated by the compiler. For example, some vector mask operations >> like `trueCount` require the input mask to be integer types, so for >> floating point type masks, the compiler will cast the mask to the >> corresponding integer type mask automatically before doing the mask >> operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` >> don't generate code, otherwise code will be generated to extend or narrow >> the mask. This IR node is not free no matter it generates code or not >> because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` >> The middle `VectorMaskCast` prevented the following optimization: >> `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which >> blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we >> can safely do the optimization. But if the input value is changed, we >> can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper >> function, which can be used to uncast a chain of `VectorMaskCastNode`, >> like the existing `Node::uncast(bool)` function. The funtion returns >> the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may >> contain one or more consecutive `VectorMaskCastNode` and this does not >> affect the correctness of the optimization. Then this function can be >> called to eliminate the `VectorMaskCastNode` ch... > > src/hotspot/share/opto/vectornode.cpp line 1489: > >> 1487: Node* VectorStoreMaskNode::Identity(PhaseGVN* phase) { >> 1488: // Identity transformation on boolean vectors. >> 1489: // VectorStoreMask (VectorMaskCast ... VectorLoadMask bv) elem_size ==> bv > > Suggestion: > > // VectorStoreMask (VectorMaskCast* VectorLoadMask bv) elem_size ==> bv > > Would a regex star be more explicit about 0 or more repetitions? Yeah, done, thanks! > src/hotspot/share/opto/vectornode.cpp line 1492: > >> 1490: // vector[n]{bool} => vector[n]{t} => vector[n]{bool} >> 1491: Node* in1 = VectorNode::uncast_mask(in(1)); >> 1492: if (in1->Opcode() == Op_VectorLoadMask && length() == in1->as_Vector()->length()) { > > Can there be a mismatch with the length? Can you give me an example? There are currently no such counterexamples. Because now we require the length of `VectorMaskCastNode` to be consistent with the length of its input node. But I'm not sure whether this restriction will be lifted in the future, and this optimization requires the length to be the same. Because of this requirement, I added this check. Similarly, in `uncast_mask` I also did the following assert: `assert(n->as_Vector()->length() == in1->as_Vector()->length(), "vector length must match");` Do you think it would be better to change this condition to an assert? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2617758364 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2617770569 From wenanjian at openjdk.org Mon Dec 15 03:02:28 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 15 Dec 2025 03:02:28 GMT Subject: RFR: 8373069: RISC-V: implement GHASH intrinsic [v4] In-Reply-To: References: <8MCvHCHkscmoOkB_cKGP5mkhHWBw6B3PfalaBL4aVg0=.0a6e3bc9-7b6b-498d-81fb-1a276adc2a31@github.com> Message-ID: <-33xKAvpX5TytD9QRVeSE2dZC_-OKDG-Oowwgq3g2fw=.57f37567-e175-4fd8-977e-06a2be36f19c@github.com> On Sun, 14 Dec 2025 03:50:11 GMT, Feilong Jiang wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> modify format > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3019: > >> 3017: assert(UseGHASHIntrinsics, "Must be"); >> 3018: assert(UseZvbb, "need Zvbb extension support"); >> 3019: assert(UseZvkg, "need GHASH instructions (Zvkg extension) support"); > > Do we need `UseZvbb` and `UseZvkg` assertions here? `UseGHASHIntrinsics` should be enough as it depends on `UseZvbb` and `UseZvkg`. Thanks for pointing out. yes, it seems a little bit redundant, I'll try delete it and test it one more time for sure before update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28548#discussion_r2617796611 From erfang at openjdk.org Mon Dec 15 03:01:39 2025 From: erfang at openjdk.org (Eric Fang) Date: Mon, 15 Dec 2025 03:01:39 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v6] In-Reply-To: References: Message-ID: > `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. > > This PR does the following optimizations: > 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. > 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as th... Eric Fang has updated the pull request incrementally with one additional commit since the last revision: Refine code comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28313/files - new: https://git.openjdk.org/jdk/pull/28313/files/a14cec2c..2ce36c8d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=04-05 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28313/head:pull/28313 PR: https://git.openjdk.org/jdk/pull/28313 From xgong at openjdk.org Mon Dec 15 03:13:54 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 15 Dec 2025 03:13:54 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v5] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 02:36:06 GMT, Eric Fang wrote: >> src/hotspot/share/opto/vectornode.cpp line 1492: >> >>> 1490: // vector[n]{bool} => vector[n]{t} => vector[n]{bool} >>> 1491: Node* in1 = VectorNode::uncast_mask(in(1)); >>> 1492: if (in1->Opcode() == Op_VectorLoadMask && length() == in1->as_Vector()->length()) { >> >> Can there be a mismatch with the length? Can you give me an example? > > There are currently no such counterexamples. Because now we require the length of `VectorMaskCastNode` to be consistent with the length of its input node. But I'm not sure whether this restriction will be lifted in the future, and this optimization requires the length to be the same. Because of this requirement, I added this check. Similarly, in `uncast_mask` I also did the following assert: > `assert(n->as_Vector()->length() == in1->as_Vector()->length(), "vector length must match");` > > Do you think it would be better to change this condition to an assert? Yeah, currently we might not have such a real case that the length is not matched, regardless whether there is mask casts inside the chain. At least I cannot find such a case in API level. Not sure whether such pattern will be generated by compiler itself in future due to some optimizations. As an optimization of `VectorStoreMask (VectorLoadMask vect) => vec`, we must make sure that the vector type before and after are exactly matched with each other. Consider both the basic element type of `VectorStoreMask` and the input of `VectorLoadMask` (i.e. the `vect`) are `boolean`, just checking the vector length is enough. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2617810466 From jbhateja at openjdk.org Mon Dec 15 06:32:56 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 15 Dec 2025 06:32:56 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v6] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 03:01:39 GMT, Eric Fang wrote: >> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. >> >> This PR does the following optimizations: >> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. >> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect... > > Eric Fang has updated the pull request incrementally with one additional commit since the last revision: > > Refine code comments src/hotspot/share/opto/vectornode.cpp line 1062: > 1060: if (!in1->isa_Vector()) { > 1061: break; > 1062: } Can you write a comment here, why you want to avoid handling masks of type TypeVectMask ? src/hotspot/share/opto/vectornode.cpp line 1063: > 1061: break; > 1062: } > 1063: assert(n->as_Vector()->length() == in1->as_Vector()->length(), "vector length must match"); While assertions are good to add, but mask cast is a lanewise operation, i.e. length compatibility is implied, and adding an assertion for IR invariants is redundant. test/hotspot/jtreg/compiler/vectorapi/VectorStoreMaskIdentityTest.java line 186: > 184: testThreeCastsKernel(IntVector.SPECIES_128, ShortVector.SPECIES_64, FloatVector.SPECIES_128, LongVector.SPECIES_256); > 185: verifyResult(IntVector.SPECIES_128.length()); > 186: } A nit, you can define final static species like S128, S64 pointing to fully qualified species, it will reduce the verbosity. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2618083147 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2618086953 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2618109644 From epeter at openjdk.org Mon Dec 15 06:49:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 06:49:56 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 17:34:35 GMT, Beno?t Maillard wrote: >> We should test `Float16` with Template Framework Tests. For this, I'm now implementing: >> >> - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. >> - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. >> - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. > > test/hotspot/jtreg/compiler/igvn/ExpressionFuzzer.java line 343: > >> 341: // Generate expressions with any scalar numeric types. >> 342: for (CodeGenerationDataNameType type : SCALAR_NUMERIC_TYPES) { >> 343: for (int i = 0; i < 2; i++) { > > What does this loop do? And why do we have only 2 iterations here, but 10 for `PRIMITIVE_TYPES`? It `Generate[s] expressions with any scalar numeric types.` ;) The question is just how many per (output) type. Here we do 2 for each type. The 2 vs 10 is quite arbitrary. I did not want to increase the runtime of the test too much. For now, focusing more on the primitive types an operations is probably good, Float16 is still rather niche. But we can change the balance in the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28095#discussion_r2618176244 From epeter at openjdk.org Mon Dec 15 06:57:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 06:57:34 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations [v2] In-Reply-To: References: Message-ID: > We should test `Float16` with Template Framework Tests. For this, I'm now implementing: > > - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. > - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. > - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - merge with master - add more flags again - add module to compilation - Merge branch 'master' into JDK-8370922-TemplateFramework-Library-Float16 - remove old TODOs - add Float16 to ExpressionFuzzer.java - fix jtreg commands - remove some unnecessary incubator flags - comparisons - rest of Float16 operators - ... and 5 more: https://git.openjdk.org/jdk/compare/dc1b0b5f...c7da6b4d ------------- Changes: https://git.openjdk.org/jdk/pull/28095/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28095&range=01 Stats: 376 lines in 9 files changed: 348 ins; 4 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/28095.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28095/head:pull/28095 PR: https://git.openjdk.org/jdk/pull/28095 From erfang at openjdk.org Mon Dec 15 06:57:56 2025 From: erfang at openjdk.org (Eric Fang) Date: Mon, 15 Dec 2025 06:57:56 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v6] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 03:01:39 GMT, Eric Fang wrote: >> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. >> >> This PR does the following optimizations: >> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. >> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect... > > Eric Fang has updated the pull request incrementally with one additional commit since the last revision: > > Refine code comments Thanks for your review! @jatin-bhateja ------------- PR Review: https://git.openjdk.org/jdk/pull/28313#pullrequestreview-3576654397 From erfang at openjdk.org Mon Dec 15 06:58:01 2025 From: erfang at openjdk.org (Eric Fang) Date: Mon, 15 Dec 2025 06:58:01 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v6] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 06:09:26 GMT, Jatin Bhateja wrote: >> Eric Fang has updated the pull request incrementally with one additional commit since the last revision: >> >> Refine code comments > > src/hotspot/share/opto/vectornode.cpp line 1062: > >> 1060: if (!in1->isa_Vector()) { >> 1061: break; >> 1062: } > > Can you write a comment here, why you want to avoid handling masks of type TypeVectMask ? Hi, @jatin-bhateja, I didn't quite understand what you meant. I'm not sure if you mistook `isa_Vector` for `isa_vectormask`. Checking `isa_Vector` here is to ensure that `in1` is a `VectorNode`, so that it calls the `as_Vector` function. > src/hotspot/share/opto/vectornode.cpp line 1063: > >> 1061: break; >> 1062: } >> 1063: assert(n->as_Vector()->length() == in1->as_Vector()->length(), "vector length must match"); > > While assertions are good to add, but mask cast is a lanewise operation, i.e. length compatibility is implied, and adding an assertion for IR invariants is redundant. My main concern here is that the requirement for `VectorMaskCastNode` to have the same length for both input and output might have been removed in the future. I'm not sure, but we do require the lengths to be the same here, so I added this assertion. @eme64 has a similar comment; see https://github.com/openjdk/jdk/pull/28313/changes#r2614577536. So, if you all think that the requirement for lane length in `VectorMaskCastNode` won't be removed, then we can delete this assertion and the condition below. > test/hotspot/jtreg/compiler/vectorapi/VectorStoreMaskIdentityTest.java line 186: > >> 184: testThreeCastsKernel(IntVector.SPECIES_128, ShortVector.SPECIES_64, FloatVector.SPECIES_128, LongVector.SPECIES_256); >> 185: verifyResult(IntVector.SPECIES_128.length()); >> 186: } > > A nit, you can define final static species like S128, S64 pointing to fully qualified species, it will reduce the verbosity. Make sense, I'll do the change in the next commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2618171442 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2618189379 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2618195582 From epeter at openjdk.org Mon Dec 15 07:04:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 07:04:37 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations [v3] In-Reply-To: References: Message-ID: > We should test `Float16` with Template Framework Tests. For this, I'm now implementing: > > - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. > - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. > - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: rename test for Galder ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28095/files - new: https://git.openjdk.org/jdk/pull/28095/files/c7da6b4d..78bd0cc0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28095&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28095&range=01-02 Stats: 5 lines in 2 files changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28095.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28095/head:pull/28095 PR: https://git.openjdk.org/jdk/pull/28095 From epeter at openjdk.org Mon Dec 15 07:04:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 07:04:37 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations [v3] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 06:47:22 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/igvn/ExpressionFuzzer.java line 343: >> >>> 341: // Generate expressions with any scalar numeric types. >>> 342: for (CodeGenerationDataNameType type : SCALAR_NUMERIC_TYPES) { >>> 343: for (int i = 0; i < 2; i++) { >> >> What does this loop do? And why do we have only 2 iterations here, but 10 for `PRIMITIVE_TYPES`? > > It `Generate[s] expressions with any scalar numeric types.` ;) > The question is just how many per (output) type. Here we do 2 for each type. > > The 2 vs 10 is quite arbitrary. I did not want to increase the runtime of the test too much. For now, focusing more on the primitive types an operations is probably good, Float16 is still rather niche. But we can change the balance in the future. Ah yes, and note: even if we start with a type other than Float16 as the output, we can still have Float16 components in the expression, via conversion for example. We can do `float -> Float16 -> float` for example. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28095#discussion_r2618206317 From epeter at openjdk.org Mon Dec 15 07:04:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 07:04:38 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations [v3] In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 14:54:23 GMT, Galder Zamarre?o wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> rename test for Galder > > test/hotspot/jtreg/testlibrary_tests/verify/tests/TestVerifyIncubatorVector.java line 44: > >> 42: import compiler.lib.verify.*; >> 43: >> 44: public class TestVerifyIncubatorVector { > > I have doubts about leaving the "Incubator" name in the test class name as it's temporary. Are you going to refactor the class name when API is not incubator any more? Maybe `TestVerifyVectorAPI` instead? @galderz I now renamed it to `TestVerifyFloat16.java`, I think it is better, even if I eventually have to refactor the internals of the test we don't have to rename the test itself :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28095#discussion_r2618210641 From mhaessig at openjdk.org Mon Dec 15 07:24:55 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 15 Dec 2025 07:24:55 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v6] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:42:42 GMT, Roland Westrelin wrote: >> For this failure memory stats are: >> >> >> Total Usage: 1095525816 >> --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- >> Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other >> none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 >> parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 >> optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 >> connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 >> iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 >> idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 >> macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 >> postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 >> scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 >> regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 >> ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > package declaration Testing passed tier1-3 linux-x64-debug, linux-aarch64-debug, macosx-aarch64-debug, macosx-x64-debug, windows-x64-debug. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28581#pullrequestreview-3576777552 From jbhateja at openjdk.org Mon Dec 15 07:32:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 15 Dec 2025 07:32:53 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v6] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 06:45:18 GMT, Eric Fang wrote: >> Eric Fang has updated the pull request incrementally with one additional commit since the last revision: >> >> Refine code comments > > src/hotspot/share/opto/vectornode.cpp line 1062: > >> 1060: if (!in1->isa_Vector()) { >> 1061: break; >> 1062: } > > Hi, @jatin-bhateja, I didn't quite understand what you meant. I'm not sure if you mistook `isa_Vector` for `isa_vectormask`. Checking `isa_Vector` here is to ensure that `in1` is a `VectorNode`, so that it calls the `as_Vector` function. I am seeing a different behaviour b/w UseAVX=2 and UseAVX=3 following kernel. public static final VectorSpecies FSP = FloatVector.SPECIES_PREFERRED; public static long micro(long ctr) { VectorMask mask = VectorMask.fromLong(FSP, 15); return mask.toLong(); } TURIN>java --add-modules=jdk.incubator.vector -XX:UseAVX=3 -Xbatch -XX:-TieredCompilation -XX:CompileCommand=PrintIdealPhase,testmcast::micro,BEFORE_MATCHIN G -cp . testmcast CompileCommand: PrintIdealPhase testmcast.micro const char* PrintIdealPhase = 'BEFORE_MATCHING' AFTER: BEFORE_MATCHING 0 Root === 0 368 [[ 0 1 3 25 ]] inner 3 Start === 3 0 [[ 3 5 6 7 8 9 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:long, 6:half} 5 Parm === 3 [[ 368 ]] Control !jvms: testmcast::micro @ bci:-1 (line 9) 6 Parm === 3 [[ 368 ]] I_O !jvms: testmcast::micro @ bci:-1 (line 9) 7 Parm === 3 [[ 368 ]] Memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: testmcast::micro @ bci:-1 (line 9) 8 Parm === 3 [[ 368 ]] FramePtr !jvms: testmcast::micro @ bci:-1 (line 9) 9 Parm === 3 [[ 368 ]] ReturnAdr !jvms: testmcast::micro @ bci:-1 (line 9) 25 ConL === 0 [[ 376 ]] #long:15 368 Return === 5 6 7 8 9 returns 398 [[ 0 ]] 376 VectorLongToMask === _ 25 [[ 397 ]] #vectormask !jvms: VectorMask::fromLong @ bci:39 (line 243) testmcast::micro @ bci:6 (line 9) 397 VectorMaskCast === _ 376 [[ 398 ]] #vectormask !jvms: Float512Vector$Float512Mask::toLong @ bci:35 (line 765) testmcast::micro @ bci:11 (line 10) 398 VectorMaskToLong === _ 397 [[ 368 ]] #long !jvms: Float512Vector$Float512Mask::toLong @ bci:35 (line 765) testmcast::micro @ bci:11 (line 10) [time] 17ms [res] 300000000 TURIN>java --add-modules=jdk.incubator.vector -XX:UseAVX=2 -Xbatch -XX:-TieredCompilation -XX:CompileCommand=PrintIdealPhase,testmcast::micro,BEFORE_MATCHIN G -cp . testmcast CompileCommand: PrintIdealPhase testmcast.micro const char* PrintIdealPhase = 'BEFORE_MATCHING' AFTER: BEFORE_MATCHING 0 Root === 0 368 [[ 0 1 3 25 ]] inner 3 Start === 3 0 [[ 3 5 6 7 8 9 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:long, 6:half} 5 Parm === 3 [[ 368 ]] Control !jvms: testmcast::micro @ bci:-1 (line 9) 6 Parm === 3 [[ 368 ]] I_O !jvms: testmcast::micro @ bci:-1 (line 9) 7 Parm === 3 [[ 368 ]] Memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: testmcast::micro @ bci:-1 (line 9) 8 Parm === 3 [[ 368 ]] FramePtr !jvms: testmcast::micro @ bci:-1 (line 9) 9 Parm === 3 [[ 368 ]] ReturnAdr !jvms: testmcast::micro @ bci:-1 (line 9) 25 ConL === 0 [[ 368 ]] #long:15 368 Return === 5 6 7 8 9 returns 25 [[ 0 ]] [time] 9ms [res] 300000000 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2618289402 From jbhateja at openjdk.org Mon Dec 15 07:40:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 15 Dec 2025 07:40:53 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v6] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 06:52:22 GMT, Eric Fang wrote: >> src/hotspot/share/opto/vectornode.cpp line 1063: >> >>> 1061: break; >>> 1062: } >>> 1063: assert(n->as_Vector()->length() == in1->as_Vector()->length(), "vector length must match"); >> >> While assertions are good to add, but mask cast is a lanewise operation, i.e. length compatibility is implied, and adding an assertion for IR invariants is redundant. > > My main concern here is that the requirement for `VectorMaskCastNode` to have the same length for both input and output might have been removed in the future. I'm not sure, but we do require the lengths to be the same here, so I added this assertion. @eme64 has a similar comment; see https://github.com/openjdk/jdk/pull/28313/changes#r2614577536. So, if you all think that the requirement for lane length in `VectorMaskCastNode` won't be removed, then we can delete this assertion and the condition below. I think assertion here is redundant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2618311660 From erfang at openjdk.org Mon Dec 15 07:59:54 2025 From: erfang at openjdk.org (Eric Fang) Date: Mon, 15 Dec 2025 07:59:54 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v6] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 07:30:10 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectornode.cpp line 1062: >> >>> 1060: if (!in1->isa_Vector()) { >>> 1061: break; >>> 1062: } >> >> Hi, @jatin-bhateja, I didn't quite understand what you meant. I'm not sure if you mistook `isa_Vector` for `isa_vectormask`. Checking `isa_Vector` here is to ensure that `in1` is a `VectorNode`, so that it calls the `as_Vector` function. > > Correct, I am seeing a different behaviour b/w UseAVX=2 and UseAVX=3 for following kernel. Not related to your new code but due to other sideeffect. kindly have a look. > > > public static final VectorSpecies FSP = FloatVector.SPECIES_PREFERRED; > > public static long micro(long ctr) { > VectorMask mask = VectorMask.fromLong(FSP, 15); > return mask.toLong(); > } > > > TURIN>java --add-modules=jdk.incubator.vector -XX:UseAVX=3 -Xbatch -XX:-TieredCompilation -XX:CompileCommand=PrintIdealPhase,testmcast::micro,BEFORE_MATCHIN > G -cp . testmcast > CompileCommand: PrintIdealPhase testmcast.micro const char* PrintIdealPhase = 'BEFORE_MATCHING' > AFTER: BEFORE_MATCHING > 0 Root === 0 368 [[ 0 1 3 25 ]] inner > 3 Start === 3 0 [[ 3 5 6 7 8 9 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:long, 6:half} > 5 Parm === 3 [[ 368 ]] Control !jvms: testmcast::micro @ bci:-1 (line 9) > 6 Parm === 3 [[ 368 ]] I_O !jvms: testmcast::micro @ bci:-1 (line 9) > 7 Parm === 3 [[ 368 ]] Memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: testmcast::micro @ bci:-1 (line 9) > 8 Parm === 3 [[ 368 ]] FramePtr !jvms: testmcast::micro @ bci:-1 (line 9) > 9 Parm === 3 [[ 368 ]] ReturnAdr !jvms: testmcast::micro @ bci:-1 (line 9) > 25 ConL === 0 [[ 376 ]] #long:15 > 368 Return === 5 6 7 8 9 returns 398 [[ 0 ]] > 376 VectorLongToMask === _ 25 [[ 397 ]] #vectormask !jvms: VectorMask::fromLong @ bci:39 (line 243) testmcast::micro @ bci:6 (line 9) > 397 VectorMaskCast === _ 376 [[ 398 ]] #vectormask !jvms: Float512Vector$Float512Mask::toLong @ bci:35 (line 765) testmcast::micro @ bci:11 (line 10) > 398 VectorMaskToLong === _ 397 [[ 368 ]] #long !jvms: Float512Vector$Float512Mask::toLong @ bci:35 (line 765) testmcast::micro @ bci:11 (line 10) > [time] 17ms [res] 300000000 > TURIN>java --add-modules=jdk.incubator.vector -XX:UseAVX=2 -Xbatch -XX:-TieredCompilation -XX:CompileCommand=PrintIdealPhase,testmcast::micro,BEFORE_MATCHIN > G -cp . testmcast > CompileCommand: PrintIdealPhase testmcast.micro const char* PrintIdealPhase = 'BEFORE_MATCHING' > AFTER: BEFORE_MATCHING > 0 Root === 0 368 [[ 0 1 3 25 ]] inner > 3 Start === 3 0 [[ 3 5 6 7 8 9 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:long, 6:half} > 5 Parm === 3 [[ 368 ]] Control !jvms: testmcast::micro @ bci:-1 (line 9) > 6 Parm === 3 [[ 368 ]] I_O !jvms: testmcast::micro @ bci:-1 (line 9) > 7 Parm === 3 [[ 368 ]] Memory ... This is caused by the different IRs when using AVX2 and AVX3. - With AVX3 the generated IRs are: `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))` - With AVX2 the generated IRs are: `(VectorMaskToLong (VectorStoreMask (VectorMaskCast (VectorLoadMask VectorLongToMask x)))))` We have supported the following optimizations: - `(VectorStoreMask (VectorMaskCast (VectorLoadMask x))) => (x)` and - `(VectorMaskToLong (VectorLongToMask x)) => (x)`. So with AVX2, `(VectorMaskToLong (VectorStoreMask (VectorMaskCast (VectorLoadMask VectorLongToMask x))))) => (x)` `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x))) => (x)` is a potential optimization, I have mentioned this in the commit message. But now we have not supported it yet. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2618369569 From galder at openjdk.org Mon Dec 15 08:03:47 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 15 Dec 2025 08:03:47 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations [v2] In-Reply-To: References: Message-ID: > `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. > > The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. > > Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. > > If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?): > > > // MinINode::Ideal > // Did not investigate, but there are some patterns that might > // need more notification. > case Op_MinI: > case Op_MaxI: // preemptively removed it as well. > return false; > > > I've run tier1-3 tests on linux/x64 and they passed. Galder Zamarre?o has updated the pull request incrementally with six additional commits since the last revision: - Remove MinI/MaxI exceptions from verify_Ideal_for - Iterate over enum values instead - Refactor to MaxNode::IdealI - Remove variables - Use ${test.main.class} - Move to compiler.igvn package ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28770/files - new: https://git.openjdk.org/jdk/pull/28770/files/2e241115..1ae30815 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28770&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28770&range=00-01 Stats: 308 lines in 4 files changed: 145 ins; 163 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28770/head:pull/28770 PR: https://git.openjdk.org/jdk/pull/28770 From duke at openjdk.org Mon Dec 15 08:14:02 2025 From: duke at openjdk.org (Tobias Hotz) Date: Mon, 15 Dec 2025 08:14:02 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v10] In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 12:09:44 GMT, Emanuel Peter wrote: >> Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: >> >> Move Test to compiler.igvn > > Very nice work @ichttt ! I like all your comments, and thanks for all the test cases, including the randomized ones! > > I just have a few minor suggestions :) @eme64 do you have time to do another review? :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/26143#issuecomment-3654337753 From duke at openjdk.org Mon Dec 15 08:16:57 2025 From: duke at openjdk.org (Tobias Hotz) Date: Mon, 15 Dec 2025 08:16:57 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v3] In-Reply-To: References: Message-ID: On Thu, 30 Oct 2025 18:23:46 GMT, Hannes Greule wrote: >> The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > review IMO, this change improves things over the current mainline. Yeah, it may not be perfect, but the solution of deferring to IGVN has been discussed by several people and has been deemed to be sufficient for now. I just hope this will not be dropped, and I would vote to merge the patch as-is ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3654347377 From bmaillard at openjdk.org Mon Dec 15 08:31:55 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 15 Dec 2025 08:31:55 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations [v3] In-Reply-To: References: Message-ID: <7-TP-qfUc6CuV7w1vGhIdtzv7Q7L9K5HdV4_ytw9duM=.83c95e01-64b3-440b-8dbe-16a1c2b9996c@github.com> On Mon, 15 Dec 2025 06:58:52 GMT, Emanuel Peter wrote: >> It `Generate[s] expressions with any scalar numeric types.` ;) >> The question is just how many per (output) type. Here we do 2 for each type. >> >> The 2 vs 10 is quite arbitrary. I did not want to increase the runtime of the test too much. For now, focusing more on the primitive types an operations is probably good, Float16 is still rather niche. But we can change the balance in the future. > > Ah yes, and note: even if we start with a type other than Float16 as the output, we can still have Float16 components in the expression, via conversion for example. We can do `float -> Float16 -> float` for example. I meant to ask specificially about the inner loop sorry :) The explanation makes sense, thanks for explaining a little more. Perhaps we could have a comment for that, or have constants with more explicit names for 2 and 10? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28095#discussion_r2618455025 From xgong at openjdk.org Mon Dec 15 08:36:51 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 15 Dec 2025 08:36:51 GMT Subject: RFR: 8372136: VectorAPI: Refactor subword gather load API java implementation In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 01:21:14 GMT, Vladimir Ivanov wrote: > Good work, Xiaohong! > Thanks so much for your review! > Can you, please, include samples of machine code generated before/after the patch (for AVX2 and AVX512)? Sure. The generated code has no difference for cases that just need **1 gather load**. For cases that need **2/4 times** of gather loads, the main differences come from the **duplicate initializing instructions** before iterations of 8B gather loads (which could be optimized in future), and the additional code generated for **vector slice and merging**. Following is an example of loading a `Short256Vector` under `-XX:UseAVX=2`, which needs 2 times of gather loads. The corresponding Java code is: private static final VectorSpecies S_SPECIES = ShortVector.SPECIES_PREFERRED; static void gather_short() { for (int i = 0; i < LENGTH; i += S_SPECIES.length()) { ShortVector.fromArray(S_SPECIES, sa, i, index, i) .intoArray(sr, i); } } static void gather_short_masked() { VectorMask mask = VectorMask.fromArray(S_SPECIES, m, 0); for (int i = 0; i < LENGTH; i += S_SPECIES.length()) { ShortVector.fromArray(S_SPECIES, sa, i, index, i, mask) .intoArray(sr, i); } } Here is the kernel code generated **without** this patch: 0x00007a0e8c06ecb0: vmovd %r9d,%xmm1 0x00007a0e8c06ecb5: lea 0x10(%rbx,%rsi,2),%r14 0x00007a0e8c06ecba: mov %r13,%r8 0x00007a0e8c06ecbd: mov $0x10,%r9d 0x00007a0e8c06ecc3: vpxor %ymm5,%ymm5,%ymm5 0x00007a0e8c06ecc7: vpxor %ymm4,%ymm4,%ymm4 0x00007a0e8c06eccb: vpcmpeqd %ymm6,%ymm6,%ymm6 0x00007a0e8c06eccf: vpsubd %ymm6,%ymm5,%ymm6 0x00007a0e8c06ecd3: vpslld $0x1,%ymm6,%ymm6 0x00007a0e8c06ecd8: vmovdqu 0x41020(%rip),%ymm5 # Stub::Stub Generator vector_iota_indices_stub+128 0x00007a0e8c0afd00 ; {external_word} 0x00007a0e8c06ece0: vpxor %ymm3,%ymm3,%ymm3 0x00007a0e8c06ece4: mov (%r8),%r11d 0x00007a0e8c06ece7: vpinsrw $0x0,(%r14,%r11,2),%xmm3,%xmm3 0x00007a0e8c06ecee: mov 0x4(%r8),%r11d 0x00007a0e8c06ecf2: vpinsrw $0x1,(%r14,%r11,2),%xmm3,%xmm3 0x00007a0e8c06ecf9: mov 0x8(%r8),%r11d 0x00007a0e8c06ecfd: vpinsrw $0x2,(%r14,%r11,2),%xmm3,%xmm3 0x00007a0e8c06ed04: mov 0xc(%r8),%r11d 0x00007a0e8c06ed08: vpinsrw $0x3,(%r14,%r11,2),%xmm3,%xmm3 0x00007a0e8c06ed0f: vpermd %ymm3,%ymm5,%ymm3 0x00007a0e8c06ed14: vpsubd %ymm6,%ymm5,%ymm5 0x00007a0e8c06ed18: vpor %ymm3,%ymm4,%ymm4 0x00007a0e8c06ed1c: add $0x10,%r8 0x00007a0e8c06ed20: sub $0x4,%r9d 0x00007a0e8c06ed24: jne 0x00007a0e8c06ece0 0x00007a0e8c06ed26: vmovdqu %ymm4,0x10(%rbp,%rsi,2) And here is the kernel code generated **with** this patch: 0x000070118c06a033: vmovd %edi,%xmm5 0x000070118c06a037: vmovq %rbp,%xmm3 0x000070118c06a03c: vmovd %ecx,%xmm2 0x000070118c06a040: mov %r9d,(%rsp) 0x000070118c06a044: lea 0x10(%rsi,%r10,2),%r14 # start of the second gather_load operation 0x000070118c06a049: mov %r11,%rbp 0x000070118c06a04c: mov $0x8,%ecx 0x000070118c06a051: vpxor %xmm4,%xmm4,%xmm4 0x000070118c06a055: vpxor %xmm10,%xmm10,%xmm10 0x000070118c06a05a: vpcmpeqd %xmm11,%xmm11,%xmm11 0x000070118c06a05f: vpsubd %xmm11,%xmm4,%xmm11 0x000070118c06a064: vpslld $0x1,%xmm11,%xmm11 0x000070118c06a06a: vmovdqu 0x45cgt8e(%rip),%xmm4 # Stub::Stub Generator vector_iota_indices_stub+128 0x000070118c0afd00 ; {external_word} 0x000070118c06a072: vpxor %xmm6,%xmm6,%xmm6 0x000070118c06a076: mov 0x0(%rbp),%edi 0x000070118c06a079: vpinsrw $0x0,(%r14,%rdi,2),%xmm6,%xmm6 0x000070118c06a080: mov 0x4(%rbp),%edi 0x000070118c06a083: vpinsrw $0x1,(%r14,%rdi,2),%xmm6,%xmm6 0x000070118c06a08a: mov 0x8(%rbp),%edi 0x000070118c06a08d: vpinsrw $0x2,(%r14,%rdi,2),%xmm6,%xmm6 0x000070118c06a094: mov 0xc(%rbp),%edi 0x000070118c06a097: vpinsrw $0x3,(%r14,%rdi,2),%xmm6,%xmm6 0x000070118c06a09e: vpermd %ymm6,%ymm4,%ymm6 0x000070118c06a0a3: vpsubd %xmm11,%xmm4,%xmm4 0x000070118c06a0a8: vpor %xmm6,%xmm10,%xmm10 0x000070118c06a0ac: add $0x10,%rbp 0x000070118c06a0b0: sub $0x4,%ecx 0x000070118c06a0b3: jne 0x000070118c06a072 0x000070118c06a0b5: vmovdqu %xmm10,%xmm4 # vector reinterpret, the end of second gather_load 0x000070118c06a0ba: vperm2i128 $0x21,%ymm4,%ymm9,%ymm6 # vector slice 0x000070118c06a0c0: lea 0x10(%rsi,%r10,2),%r11 # start of the first gather_load operation 0x000070118c06a0c5: mov %rax,%rcx 0x000070118c06a0c8: mov $0x8,%r8d 0x000070118c06a0ce: vpxor %xmm10,%xmm10,%xmm10 0x000070118c06a0d3: vpxor %xmm4,%xmm4,%xmm4 0x000070118c06a0d7: vpcmpeqd %xmm13,%xmm13,%xmm13 0x000070118c06a0dc: vpsubd %xmm13,%xmm10,%xmm13 0x000070118c06a0e1: vpslld $0x1,%xmm13,%xmm13 0x000070118c06a0e7: vmovdqu 0x45c11(%rip),%xmm10 # Stub::Stub Generator vector_iota_indices_stub+128 0x000070118c0afd00 ; {external_word} 0x000070118c06a0ef: vpxor %xmm12,%xmm12,%xmm12 0x000070118c06a0f4: mov (%rcx),%r9d 0x000070118c06a0f7: vpinsrw $0x0,(%r11,%r9,2),%xmm12,%xmm12 0x000070118c06a0fe: mov 0x4(%rcx),%r9d 0x000070118c06a102: vpinsrw $0x1,(%r11,%r9,2),%xmm12,%xmm12 0x000070118c06a109: mov 0x8(%rcx),%r9d 0x000070118c06a10d: vpinsrw $0x2,(%r11,%r9,2),%xmm12,%xmm12 0x000070118c06a114: mov 0xc(%rcx),%r9d 0x000070118c06a118: vpinsrw $0x3,(%r11,%r9,2),%xmm12,%xmm12 0x000070118c06a11f: vpermd %ymm12,%ymm10,%ymm12 0x000070118c06a124: vpsubd %xmm13,%xmm10,%xmm10 0x000070118c06a129: vpor %xmm12,%xmm4,%xmm4 0x000070118c06a12e: add $0x10,%rcx 0x000070118c06a132: sub $0x4,%r8d 0x000070118c06a136: jne 0x000070118c06a0ef 0x000070118c06a138: vmovdqu %xmm4,%xmm4 ; vector reinterpret, the end of the first gather_load 0x000070118c06a13c: vpor %ymm6,%ymm4,%ymm4 ; final merge 0x000070118c06a140: vmovq %xmm3,%r11 0x000070118c06a145: vmovdqu %ymm4,0x10(%r11,%r10,2) ;*invokestatic store {reexecute=0 rethrow=0 return_oop=0} ; - jdk.incubator.vector.ShortVector::intoArray at 44 (line 3514) ; - VectorAPITest::gather_short at 38 (line 116) For the masked cases, besides additional added instructions, there are more code generated for the **mask slice** operations. I also attached the full code for kinds of cases. Please kindly share your feedback. Thanks a lot! [avx2_short_max_after.txt](https://github.com/user-attachments/files/24160578/avx2_short_max_after.txt) [avx2_short_max_before.txt](https://github.com/user-attachments/files/24160581/avx2_short_max_before.txt) [avx3_short_max_before.txt](https://github.com/user-attachments/files/24160582/avx3_short_max_before.txt) [avx3_short_max_after.txt](https://github.com/user-attachments/files/24160584/avx3_short_max_after.txt) [avx3_short_max_masked_after.txt](https://github.com/user-attachments/files/24160638/avx3_short_max_masked_after.txt) [avx3_short_max_masked_before.txt](https://github.com/user-attachments/files/24160644/avx3_short_max_masked_before.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28520#issuecomment-3654421400 From epeter at openjdk.org Mon Dec 15 09:02:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 09:02:05 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v11] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 21:50:27 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Simplify test, add temporary @IR rule for testLongRange and improve comments Thanks for the updates, looks good to me :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26143#pullrequestreview-3577107514 From epeter at openjdk.org Mon Dec 15 09:09:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 09:09:06 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v3] In-Reply-To: References: Message-ID: <-uZFyUW6EW1GI6AYnJYbhvweuffJvbWJUlRkAdZ2ao4=.05927bb1-9c54-4457-aba8-bcaee2884cb4@github.com> On Thu, 30 Oct 2025 18:23:46 GMT, Hannes Greule wrote: >> The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > review I'll approve it now as is. If some is willing to produce a better solution in the future, then I'm happy to review it as well ;) @merykitty a separate phase could be a good idea. It is some extra complexity though, so we have to weigh it off with the hacky-ness of the current solution (and other places where we do the delay-trick). But it would probably be worth it. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27886#pullrequestreview-3577133449 From hgreule at openjdk.org Mon Dec 15 09:12:28 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 15 Dec 2025 09:12:28 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: On Mon, 27 Oct 2025 06:49:08 GMT, Emanuel Peter wrote: >>> Do you consider the "expanded" versions of Div/Mod as a "different representation of Div/Mod"? >> >> Yes, exactly. >> >>> we could pattern match for such "expanded" versions of Div/Mod, but it would be quite complex: you would have to parse through patterns like displayed https://github.com/openjdk/jdk/pull/27886#issuecomment-3436423466. Do you think that is a good idea? >> >> Sure, It may be way above the complexity budget we are willing to spend on it. The expansion code I see for Div/Mod nodes doesn't look too complicated, but matching the pattern may require more effort. The positive thing is it'll optimize the pattern irrespective of the origin (either expanded Div/Mod or explicitly optimized in the code by the user). So, the question is how much complexity it requires vs scenarios it covers. >> >>> How does this "wrapping" help? After parsing, the CastII at the bottom of the "expanded" Div would just have the whole int range. How would the type of the CastII ever be improved, without pattern matching the "expanded" Div? >> >> It's not fully clear to me what is the scope of problematic scenarios. If it's only about Ideal() expanding the node before Value() has a chance to run, then wrapping the result of expansion in CastII/CastLL node and attach Value() as it's type should be enough (when produced type is narrower than Type::INT). >> >> If we want to to keep expanded shape while being able to compute its type as if it were the original node, then a new flavor of Cast node may help. The one which keeps the node type and its inputs and can run Value() as if it were the original node. > > @iwanowww I see, so we could implement something like a `CastII` with multiple inputs, which we know must all be identical at runtime. The first input is the one we will in the end pick. But during `Value`, we take the intersection of all input ranges. So if another (not the first input) has a narrower type, we can use that type. I suppose that would be feasible. Do you have a good name for such a node? > > What I don't know: how does that interact with other IGVN optimizations, especially those that want to pattern match specific nodes? Inserting such special cast nodes could interrupt `Ideal` optimizations, current pattern matching would not know how to deal with it. Probably it is not a big issue, but I'm not sure. @eme64 do you want me to merge master? I guess re-running tests then wouldn't hurt either. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3654545655 From mli at openjdk.org Mon Dec 15 09:13:47 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 15 Dec 2025 09:13:47 GMT Subject: Integrated: 8373428: Refine variables with the same name in nested scopes in PhaseChaitin::gather_lrg_masks In-Reply-To: <8WWg7y_W2PGKAkwrVUfN97dBZ56I2MRvbMuxowqmnZE=.4c238198-0b07-47da-8756-1485846f044f@github.com> References: <8WWg7y_W2PGKAkwrVUfN97dBZ56I2MRvbMuxowqmnZE=.4c238198-0b07-47da-8756-1485846f044f@github.com> Message-ID: On Wed, 10 Dec 2025 14:43:04 GMT, Hamlin Li wrote: > Hi, > Can you help to review this trivial patch? > > In PhaseChaitin::gather_lrg_masks, several variables have the same name in nested scopes, it looks like following code snippet. > { > A a; > { > A a; > } > } > > This is not helpful to code readability, in particular in a long method like `gather_lrg_masks`, better to rename them. > > Thanks! This pull request has now been integrated. Changeset: 3559eeca Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/3559eeca0edd537c6160c6753cf6fc304afee4ca Stats: 20 lines in 1 file changed: 0 ins; 0 del; 20 mod 8373428: Refine variables with the same name in nested scopes in PhaseChaitin::gather_lrg_masks Reviewed-by: phh ------------- PR: https://git.openjdk.org/jdk/pull/28748 From epeter at openjdk.org Mon Dec 15 09:17:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 09:17:48 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations [v4] In-Reply-To: References: Message-ID: > We should test `Float16` with Template Framework Tests. For this, I'm now implementing: > > - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. > - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. > - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: comments for Benoit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28095/files - new: https://git.openjdk.org/jdk/pull/28095/files/78bd0cc0..8d0d7e7d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28095&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28095&range=02-03 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28095.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28095/head:pull/28095 PR: https://git.openjdk.org/jdk/pull/28095 From epeter at openjdk.org Mon Dec 15 09:17:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 09:17:53 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: Message-ID: <1XovbnGPMfTX45dlT5PFCk1Bqb3Pyc_kN8vC874lKm4=.78ec990d-1950-4fa9-8dea-065a09414a1c@github.com> On Mon, 15 Dec 2025 09:09:51 GMT, Hannes Greule wrote: >> @iwanowww I see, so we could implement something like a `CastII` with multiple inputs, which we know must all be identical at runtime. The first input is the one we will in the end pick. But during `Value`, we take the intersection of all input ranges. So if another (not the first input) has a narrower type, we can use that type. I suppose that would be feasible. Do you have a good name for such a node? >> >> What I don't know: how does that interact with other IGVN optimizations, especially those that want to pattern match specific nodes? Inserting such special cast nodes could interrupt `Ideal` optimizations, current pattern matching would not know how to deal with it. Probably it is not a big issue, but I'm not sure. > > @eme64 do you want me to merge master? I guess re-running tests then wouldn't hurt either. @SirYwell Yeah, this is now quite old. Why don't you merge with master and let the GitHub actions run for now. Maybe someone has more comments on this now: speak up! I can run tests once GitHub actions are passing ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3654567133 From epeter at openjdk.org Mon Dec 15 09:17:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 09:17:51 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations [v4] In-Reply-To: <7-TP-qfUc6CuV7w1vGhIdtzv7Q7L9K5HdV4_ytw9duM=.83c95e01-64b3-440b-8dbe-16a1c2b9996c@github.com> References: <7-TP-qfUc6CuV7w1vGhIdtzv7Q7L9K5HdV4_ytw9duM=.83c95e01-64b3-440b-8dbe-16a1c2b9996c@github.com> Message-ID: On Mon, 15 Dec 2025 08:29:38 GMT, Beno?t Maillard wrote: >> Ah yes, and note: even if we start with a type other than Float16 as the output, we can still have Float16 components in the expression, via conversion for example. We can do `float -> Float16 -> float` for example. > > I meant to ask specificially about the inner loop sorry :) > The explanation makes sense, thanks for explaining a little more. Perhaps we could have a comment for that, or have constants with more explicit names for 2 and 10? @benoitmaillard I added some extra comments. What do you think, is it better now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28095#discussion_r2618586955 From bmaillard at openjdk.org Mon Dec 15 09:22:06 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 15 Dec 2025 09:22:06 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations [v4] In-Reply-To: References: <7-TP-qfUc6CuV7w1vGhIdtzv7Q7L9K5HdV4_ytw9duM=.83c95e01-64b3-440b-8dbe-16a1c2b9996c@github.com> Message-ID: <5efZOzhCTPTAZwiS9tZ_zqRUBUWaWsINwLGY7E-dQuA=.c8e39788-59e8-4a61-9673-1d56b9d1a066@github.com> On Mon, 15 Dec 2025 09:12:59 GMT, Emanuel Peter wrote: >> I meant to ask specificially about the inner loop sorry :) >> The explanation makes sense, thanks for explaining a little more. Perhaps we could have a comment for that, or have constants with more explicit names for 2 and 10? > > @benoitmaillard I added some extra comments. What do you think, is it better now? Great, thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28095#discussion_r2618608053 From hgreule at openjdk.org Mon Dec 15 09:33:19 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 15 Dec 2025 09:33:19 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v4] In-Reply-To: References: Message-ID: > The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. > > Please let me know what you think. Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into delay-divmod-idealization - review - expand comments - delay integral Div/Mod Ideal() until IGVN - test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27886/files - new: https://git.openjdk.org/jdk/pull/27886/files/c32bb551..d9f8a698 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27886&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27886&range=02-03 Stats: 393663 lines in 4045 files changed: 254254 ins; 88788 del; 50621 mod Patch: https://git.openjdk.org/jdk/pull/27886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27886/head:pull/27886 PR: https://git.openjdk.org/jdk/pull/27886 From dfenacci at openjdk.org Mon Dec 15 10:04:23 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 15 Dec 2025 10:04:23 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel [v4] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:20:58 GMT, Christian Hagedorn wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8370315: fix typo > > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 775: > >> 773: if (!output.isEmpty()) { >> 774: System.out.println(output); >> 775: } > > We probably also need to do a similar trick as for the exceptions in order to have ordered stdouts for the scenarios? @chhagedorn I guess we can drop the printing of exception at the end then. Is that part of what you were suggesting? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2618752271 From galder at openjdk.org Mon Dec 15 10:35:14 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 15 Dec 2025 10:35:14 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations [v4] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 09:17:48 GMT, Emanuel Peter wrote: >> We should test `Float16` with Template Framework Tests. For this, I'm now implementing: >> >> - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. >> - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. >> - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > comments for Benoit @eme64 I have one additional question. How did you test this? Did you have HF hardware access or did you emulate it? In a recent chat with @benoitmaillard, I think he mentioned about using QEMU to emulate and do some testing. ------------- PR Review: https://git.openjdk.org/jdk/pull/28095#pullrequestreview-3577485057 From wenanjian at openjdk.org Mon Dec 15 10:53:15 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 15 Dec 2025 10:53:15 GMT Subject: RFR: 8373069: RISC-V: implement GHASH intrinsic [v5] In-Reply-To: References: Message-ID: > support GHASH intrinsic for crypt GCM, which need zvkg extension. > > passed the tests in > test/hotspot/jtreg/compiler/codegen/aes/ > test/jdk/com/sun/crypto Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: delete some redundant assert and modify some format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28548/files - new: https://git.openjdk.org/jdk/pull/28548/files/3bf38390..fb0f549f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28548&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28548&range=03-04 Stats: 24 lines in 2 files changed: 10 ins; 6 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/28548.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28548/head:pull/28548 PR: https://git.openjdk.org/jdk/pull/28548 From wenanjian at openjdk.org Mon Dec 15 10:53:18 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Mon, 15 Dec 2025 10:53:18 GMT Subject: RFR: 8373069: RISC-V: implement GHASH intrinsic [v4] In-Reply-To: References: <8MCvHCHkscmoOkB_cKGP5mkhHWBw6B3PfalaBL4aVg0=.0a6e3bc9-7b6b-498d-81fb-1a276adc2a31@github.com> Message-ID: On Mon, 15 Dec 2025 01:35:51 GMT, Fei Yang wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> modify format > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3021: > >> 3019: assert(UseZvkg, "need GHASH instructions (Zvkg extension) support"); >> 3020: >> 3021: __ align(CodeEntryAlignment); > > Can you move this line to immediately before L3025? Like: > > __ align(CodeEntryAlignment); > address start = __ pc(); > __ enter(); > > > Then it looks more obvious where we want to align the code. BTW: Seems CBC and CTR intrinsics need similar adjustment. it seems most of the intrinsics use align func before stub_id, maybe we can keep it for now and discuss all of them later? > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3052: > >> 3050: __ vghsh_vv(partial_hash, hash_subkey, cipher_text); >> 3051: __ subi(blocks, blocks, 1); >> 3052: __ bnez(blocks, L_ghash_loop); > > Please leave a new line after the loop. Thanks, fixed! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28548#discussion_r2618904912 PR Review Comment: https://git.openjdk.org/jdk/pull/28548#discussion_r2618905299 From epeter at openjdk.org Mon Dec 15 11:20:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 11:20:36 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 11:38:28 GMT, Emanuel Peter wrote: >> We should test `Float16` with Template Framework Tests. For this, I'm now implementing: >> >> - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. >> - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. >> - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. > > Can someone please review this? > @eme64 I have one additional question. How did you test this? Did you have HF hardware access or did you emulate it? In a recent chat with @benoitmaillard, I think he mentioned about using QEMU to emulate and do some testing. I ran it through out internal testing machines. Some of them do have specific hardware support for Float16 operations :) I've used QEMU in the past for Float16 and SVE, but it is relatively slow, so I did not bother with that for this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28095#issuecomment-3655089262 From qamai at openjdk.org Mon Dec 15 11:51:25 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 15 Dec 2025 11:51:25 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v4] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 09:33:19 GMT, Hannes Greule wrote: >> The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into delay-divmod-idealization > - review > - expand comments > - delay integral Div/Mod Ideal() until IGVN > - test I reluctantly approve this PR, then :) Note that incremental inlining happens after IGVN so you still lose there. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/27886#pullrequestreview-3577758443 From qamai at openjdk.org Mon Dec 15 11:54:44 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 15 Dec 2025 11:54:44 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: <1XovbnGPMfTX45dlT5PFCk1Bqb3Pyc_kN8vC874lKm4=.78ec990d-1950-4fa9-8dea-065a09414a1c@github.com> References: <1XovbnGPMfTX45dlT5PFCk1Bqb3Pyc_kN8vC874lKm4=.78ec990d-1950-4fa9-8dea-065a09414a1c@github.com> Message-ID: On Mon, 15 Dec 2025 09:14:49 GMT, Emanuel Peter wrote: >> @eme64 do you want me to merge master? I guess re-running tests then wouldn't hurt either. > > @SirYwell Yeah, this is now quite old. Why don't you merge with master and let the GitHub actions run for now. > > Maybe someone has more comments on this now: speak up! > > I can run tests once GitHub actions are passing ;) @eme64 It may be a little bit more in terms of LOC, but it is always simpler to reason about when we have things executing in a consequential manner rather than randomly delaying some of them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3655211152 From epeter at openjdk.org Mon Dec 15 11:54:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 11:54:45 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: <1XovbnGPMfTX45dlT5PFCk1Bqb3Pyc_kN8vC874lKm4=.78ec990d-1950-4fa9-8dea-065a09414a1c@github.com> Message-ID: <96rrRPjmvAcbsl7wlLUYck5FTD7WkQrY52PKyiwOn2s=.03996b48-829b-4048-925d-04943736fed1@github.com> On Mon, 15 Dec 2025 11:51:11 GMT, Quan Anh Mai wrote: >> @SirYwell Yeah, this is now quite old. Why don't you merge with master and let the GitHub actions run for now. >> >> Maybe someone has more comments on this now: speak up! >> >> I can run tests once GitHub actions are passing ;) > > @eme64 It may be a little bit more in terms of LOC, but it is always simpler to reason about when we have things executing in a consequential manner rather than randomly delaying some of them. @merykitty Why not file an RFE with a reproducer for incremental inlining? And yes, I totally agree it would be nicer if things were organized better! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3655213894 From fjiang at openjdk.org Mon Dec 15 12:33:18 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 15 Dec 2025 12:33:18 GMT Subject: RFR: 8373069: RISC-V: implement GHASH intrinsic [v5] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 10:53:15 GMT, Anjian Wen wrote: >> support GHASH intrinsic for crypt GCM, which need zvkg extension. >> >> passed the tests in >> test/hotspot/jtreg/compiler/codegen/aes/ >> test/jdk/com/sun/crypto > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > delete some redundant assert and modify some format Thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/28548#pullrequestreview-3577917011 From epeter at openjdk.org Mon Dec 15 12:43:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 12:43:07 GMT Subject: RFR: 8373355: C2: CompileCommand PrintIdealPhase should also print nodes that are not "reachable from below" Message-ID: PrintIdealPhase seems to only traverse the graph "from below". So if there are any nodes that are not reachable "from below" but only reachable "from above", then they do not show up in the printing. That is problematic: out TestFramework relies on that output, especially if you have some IR rule with failOn or count = 0 . The node would be in the graph, but the test tells you there is none. Before the fix, we used to see this: The LoadI and StoreI do NOT show up during ITER_GVN1. This is because at that point, the infinite loop has no exit. So the loop is not connected down to Root. But once we do a first loop opts round, we see the LoadI and StoreI in PHASEIDEALLOOP1. This is because we insert a NeverBranch, which serves as an artificial exit, that connects the loop down to Root. Fix: travese both up `+` and down `-`. --------------------------------- Note: I had to update some tests from https://github.com/openjdk/jdk/pull/22786. This is because during IGVN the constant nodes that lose all outputs are not removed. So now they show up in the IR graph, and the IR rules failed. I adjusted the IR rules to work with the operations `ModF/ModD` rather than constants `ConF/ConD`. ------------- Commit messages: - fixed up other test - wip - wip fixing ModFNodeTests.java - JDKJDK-8373355 Changes: https://git.openjdk.org/jdk/pull/28762/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28762&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373355 Stats: 220 lines in 5 files changed: 165 ins; 10 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/28762.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28762/head:pull/28762 PR: https://git.openjdk.org/jdk/pull/28762 From dfenacci at openjdk.org Mon Dec 15 12:55:48 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 15 Dec 2025 12:55:48 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel [v4] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:20:58 GMT, Christian Hagedorn wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8370315: fix typo > > test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 775: > >> 773: if (!output.isEmpty()) { >> 774: System.out.println(output); >> 775: } > > We probably also need to do a similar trick as for the exceptions in order to have ordered stdouts for the scenarios? I might have spoken too soon: JTReg seems to collect stdout and stderr and print them out at once at the end of each (JTReg) test. In this case it doesn't make much sense to print out the output of each test as soon as it finishes (it would be better to collect them and print them in order at the end). @chhagedorn, is there possibly a way to make JTReg print the output "on-the-fly" that you are aware of? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2619308712 From galder at openjdk.org Mon Dec 15 13:14:18 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 15 Dec 2025 13:14:18 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations [v4] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 09:17:48 GMT, Emanuel Peter wrote: >> We should test `Float16` with Template Framework Tests. For this, I'm now implementing: >> >> - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. >> - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. >> - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > comments for Benoit Marked as reviewed by galder (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/28095#pullrequestreview-3578115152 From galder at openjdk.org Mon Dec 15 13:14:20 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 15 Dec 2025 13:14:20 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations In-Reply-To: References: Message-ID: <-IFEp0d2KOIkA2TPCJLIDwHspovvVt-rcP0nKL5mykA=.ae6febd6-ba65-441f-bc6e-a7598f0638ef@github.com> On Mon, 15 Dec 2025 11:17:47 GMT, Emanuel Peter wrote: >> Can someone please review this? > >> @eme64 I have one additional question. How did you test this? Did you have HF hardware access or did you emulate it? In a recent chat with @benoitmaillard, I think he mentioned about using QEMU to emulate and do some testing. > > I ran it through out internal testing machines. Some of them do have specific hardware support for Float16 operations :) > I've used QEMU in the past for Float16 and SVE, but it is relatively slow, so I did not bother with that for this PR. @eme64 Thanks for explanation, all good :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28095#issuecomment-3655566526 From epeter at openjdk.org Mon Dec 15 13:19:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 13:19:25 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations [v4] In-Reply-To: References: Message-ID: <_U1F7Z0FfyFFVKdzJ3QC07XAvymMhhzpssS_NCOhefY=.c7c68e76-7212-4589-a878-bc1b3a30780a@github.com> On Fri, 12 Dec 2025 18:36:03 GMT, Beno?t Maillard wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> comments for Benoit > > Looks good to me, nice work! I only have one question. @benoitmaillard @galderz Thanks for the reviews and approvals! Now we'll have to get a reviewer rubber-stamp this :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28095#issuecomment-3655595818 From epeter at openjdk.org Mon Dec 15 13:53:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 13:53:00 GMT Subject: RFR: 8373355: C2: CompileCommand PrintIdealPhase should also print nodes that are not "reachable from below" [v2] In-Reply-To: References: Message-ID: > PrintIdealPhase seems to only traverse the graph "from below". So if there are any nodes that are not reachable "from below" but only reachable "from above", then they do not show up in the printing. > That is problematic: out TestFramework relies on that output, especially if you have some IR rule with failOn or count = 0 . The node would be in the graph, but the test tells you there is none. > > Before the fix, we used to see this: > The LoadI and StoreI do NOT show up during ITER_GVN1. > This is because at that point, the infinite loop has no exit. So the loop is not connected down to Root. > But once we do a first loop opts round, we see the LoadI and StoreI in PHASEIDEALLOOP1. > This is because we insert a NeverBranch, which serves as an artificial exit, that connects the loop down to Root. > > Fix: travese both up `+` and down `-`. > > --------------------------------- > > Note: I had to update some tests from https://github.com/openjdk/jdk/pull/22786. > This is because during IGVN the constant nodes that lose all outputs are not removed. So now they show up in the IR graph, and the IR rules failed. I adjusted the IR rules to work with the operations `ModF/ModD` rather than constants `ConF/ConD`. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRFindFromAbove.java Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28762/files - new: https://git.openjdk.org/jdk/pull/28762/files/d77e683a..769640f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28762&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28762&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28762.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28762/head:pull/28762 PR: https://git.openjdk.org/jdk/pull/28762 From rcastanedalo at openjdk.org Mon Dec 15 13:53:02 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Dec 2025 13:53:02 GMT Subject: RFR: 8373355: C2: CompileCommand PrintIdealPhase should also print nodes that are not "reachable from below" [v2] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 13:49:41 GMT, Emanuel Peter wrote: >> PrintIdealPhase seems to only traverse the graph "from below". So if there are any nodes that are not reachable "from below" but only reachable "from above", then they do not show up in the printing. >> That is problematic: out TestFramework relies on that output, especially if you have some IR rule with failOn or count = 0 . The node would be in the graph, but the test tells you there is none. >> >> Before the fix, we used to see this: >> The LoadI and StoreI do NOT show up during ITER_GVN1. >> This is because at that point, the infinite loop has no exit. So the loop is not connected down to Root. >> But once we do a first loop opts round, we see the LoadI and StoreI in PHASEIDEALLOOP1. >> This is because we insert a NeverBranch, which serves as an artificial exit, that connects the loop down to Root. >> >> Fix: travese both up `+` and down `-`. >> >> --------------------------------- >> >> Note: I had to update some tests from https://github.com/openjdk/jdk/pull/22786. >> This is because during IGVN the constant nodes that lose all outputs are not removed. So now they show up in the IR graph, and the IR rules failed. I adjusted the IR rules to work with the operations `ModF/ModD` rather than constants `ConF/ConD`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRFindFromAbove.java > > Co-authored-by: Roberto Casta?eda Lozano Looks good, thanks for fixing this! Marked as reviewed by rcastanedalo (Reviewer). test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRFindFromAbove.java line 57: > 55: // This loop has no exit. So it is at first not connected down to Root. > 56: while (true) { > 57: // Durint HASEIDEALLOOP1, we insert a NeverBranch here, with a fake Suggestion: // During PHASEIDEALLOOP1, we insert a NeverBranch here, with a fake ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28762#pullrequestreview-3578277158 PR Review: https://git.openjdk.org/jdk/pull/28762#pullrequestreview-3578289831 PR Review Comment: https://git.openjdk.org/jdk/pull/28762#discussion_r2619503453 From epeter at openjdk.org Mon Dec 15 13:53:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 13:53:02 GMT Subject: RFR: 8373355: C2: CompileCommand PrintIdealPhase should also print nodes that are not "reachable from below" [v2] In-Reply-To: References: Message-ID: <7JkKIfrRsOhDWDg6qYoxzEWsW8jWo8kXAc4RJWXIz9c=.22b6d667-1045-46c4-88ca-72aec9d5db05@github.com> On Mon, 15 Dec 2025 13:46:02 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRFindFromAbove.java >> >> Co-authored-by: Roberto Casta?eda Lozano > > Looks good, thanks for fixing this! @robcasloz Thanks for reviewing and the approval :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28762#issuecomment-3655742917 From roland at openjdk.org Mon Dec 15 14:03:25 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 15 Dec 2025 14:03:25 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 23:37:58 GMT, Dean Long wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > src/hotspot/share/opto/macro.cpp line 1211: > >> 1209: >> 1210: Node* PhaseMacroExpand::make_store(Node* ctl, Node* mem, Node* base, int offset, Node* value, BasicType bt) { >> 1211: Node* adr = basic_plus_adr(top(), base, offset); > > Doesn't this cause an assert if make_load or make_store is used with a heap oop? Isn't that a problem for code like PhaseMacroExpand::initialize_object() that calls make_store() with an object? `make_load`/`make_store` happen to be only called for non oop accesses. I could rename then to `make_raw_load`/`make_raw_store` to avoid any confusion. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2619561157 From dlunden at openjdk.org Mon Dec 15 14:08:37 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 15 Dec 2025 14:08:37 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: References: <8ZNxxtdCUYYVmqq8Lam8C9sI7xh9qqHU_5j16ef6S0Q=.d793a57b-bc4c-4ed5-acba-ebf1bd372d4b@github.com> Message-ID: On Fri, 12 Dec 2025 17:10:38 GMT, Emanuel Peter wrote: >> None was integrated with the initial change. I added one. > > And how confident are you that this one test ensures there won't be a regression? @eme64: My understanding of this issue is that it is really a case of nodes not being added properly to the IGVN worklist. What @rwestrel does is simply adding the missing entries to the worklist; he is not changing an existing optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2619577897 From epeter at openjdk.org Mon Dec 15 14:16:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 14:16:08 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: References: <8ZNxxtdCUYYVmqq8Lam8C9sI7xh9qqHU_5j16ef6S0Q=.d793a57b-bc4c-4ed5-acba-ebf1bd372d4b@github.com> Message-ID: <2FXKWaf70eozi_j7GKFAVtph8z3oxJGddAHov8jEcfo=.268d1fb4-16d2-41cf-b427-0b5b7349ffbb@github.com> On Mon, 15 Dec 2025 14:05:32 GMT, Daniel Lund?n wrote: >> And how confident are you that this one test ensures there won't be a regression? > > @eme64: My understanding of this issue is that it is really a case of nodes not being added properly to the IGVN worklist. What @rwestrel does is simply adding the missing entries to the worklist; he is not changing an existing optimization. @dlunde @robcasloz @rwestrel I leave this to you all. If you are very sure that the change is trivial, and that no additional IR tests are helpful, then leave it. But I've seen it happen multiple times that seemingly "trivial" changes have suddenly disabled older optimizations, and nobody noticed in the review. That's why I'm cautious in these cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2619608502 From dlunden at openjdk.org Mon Dec 15 14:45:07 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 15 Dec 2025 14:45:07 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: <2FXKWaf70eozi_j7GKFAVtph8z3oxJGddAHov8jEcfo=.268d1fb4-16d2-41cf-b427-0b5b7349ffbb@github.com> References: <8ZNxxtdCUYYVmqq8Lam8C9sI7xh9qqHU_5j16ef6S0Q=.d793a57b-bc4c-4ed5-acba-ebf1bd372d4b@github.com> <2FXKWaf70eozi_j7GKFAVtph8z3oxJGddAHov8jEcfo=.268d1fb4-16d2-41cf-b427-0b5b7349ffbb@github.com> Message-ID: On Mon, 15 Dec 2025 14:13:36 GMT, Emanuel Peter wrote: >> @eme64: My understanding of this issue is that it is really a case of nodes not being added properly to the IGVN worklist. What @rwestrel does is simply adding the missing entries to the worklist; he is not changing an existing optimization. > > @dlunde @robcasloz @rwestrel I leave this to you all. If you are very sure that the change is trivial, and that no additional IR tests are helpful, then leave it. But I've seen it happen multiple times that seemingly "trivial" changes have suddenly disabled older optimizations, and nobody noticed in the review. That's why I'm cautious in these cases. @eme64 Right, always good to be cautious. In this case, the only thing we do is `igvn->_worklist.push(u);`, which should be harmless and really only enable further optimizations. I'll let @rwestrel confirm! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2619724100 From epeter at openjdk.org Mon Dec 15 15:04:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 15:04:35 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: References: <8ZNxxtdCUYYVmqq8Lam8C9sI7xh9qqHU_5j16ef6S0Q=.d793a57b-bc4c-4ed5-acba-ebf1bd372d4b@github.com> <2FXKWaf70eozi_j7GKFAVtph8z3oxJGddAHov8jEcfo=.268d1fb4-16d2-41cf-b427-0b5b7349ffbb@github.com> Message-ID: On Mon, 15 Dec 2025 14:42:43 GMT, Daniel Lund?n wrote: > In this case, the only thing we do is igvn->_worklist.push(u);, which should be harmless and really only enable further optimizations. If you are sure about that. I have not looked at it in depth. But what I see is also that code was moved from Identity to Ideal, and refactored. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2619791233 From roland at openjdk.org Mon Dec 15 15:17:47 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 15 Dec 2025 15:17:47 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 23:45:16 GMT, Dean Long wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > src/hotspot/share/opto/memnode.cpp line 4126: > >> 4124: Node* base = dest; >> 4125: if (phase->type(dest)->isa_oopptr() == nullptr) { >> 4126: base = phase->C->top(); > > How is this possible? Aren't all arrays in the heap? `isa_oopptr()` is non null for all oops, array and instance. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2619838306 From roland at openjdk.org Mon Dec 15 15:19:17 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 15 Dec 2025 15:19:17 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v7] In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 15:37:58 GMT, Roberto Casta?eda Lozano wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8351889 >> - Update src/hotspot/share/opto/phaseX.hpp >> >> Co-authored-by: Roberto Casta?eda Lozano >> - Update src/hotspot/share/opto/phaseX.cpp >> >> Co-authored-by: Roberto Casta?eda Lozano >> - review >> - more >> - review >> - Merge branch 'master' into JDK-8351889 >> - exp >> - Merge branch 'master' into JDK-8351889 >> - ... and 9 more: https://git.openjdk.org/jdk/compare/3c66d85f...100fad3d > > Looks good! @robcasloz thanks for the review. Does testing look ok? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3656200397 From roland at openjdk.org Mon Dec 15 15:23:26 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 15 Dec 2025 15:23:26 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v4] In-Reply-To: References: <8ZNxxtdCUYYVmqq8Lam8C9sI7xh9qqHU_5j16ef6S0Q=.d793a57b-bc4c-4ed5-acba-ebf1bd372d4b@github.com> <2FXKWaf70eozi_j7GKFAVtph8z3oxJGddAHov8jEcfo=.268d1fb4-16d2-41cf-b427-0b5b7349ffbb@github.com> Message-ID: On Mon, 15 Dec 2025 15:01:56 GMT, Emanuel Peter wrote: >> @eme64 Right, always good to be cautious. In this case, the only thing we do is `igvn->_worklist.push(u);`, which should be harmless and really only enable further optimizations. I'll let @rwestrel confirm! > >> In this case, the only thing we do is igvn->_worklist.push(u);, which should be harmless and really only enable further optimizations. > > If you are sure about that. I have not looked at it in depth. But what I see is also that code was moved from Identity to Ideal, and refactored. Code was refactored but not moved. It is fairly similar to bugs fixed by adding logic to `PhaseIterGVN::add_users_of_use_to_worklist()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28677#discussion_r2619855092 From roland at openjdk.org Mon Dec 15 15:25:44 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 15 Dec 2025 15:25:44 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v2] In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 14:47:34 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/graphKit.cpp line 4191: >> >>> 4189: Node* res_mem = _gvn.transform(new SCMemProjNode(_gvn.transform(str))); >>> 4190: if (adr_type == TypePtr::BOTTOM) { >>> 4191: set_all_memory(res_mem); >> >> I'm confused by this. Doesn't `StrCompressedCopyNode` only write to dst? So the only part of the memory state that it updates is the one for `TypeAryPtr::BYTES`? > > It is because if a node consumes more memory than it produces, we need to compute its anti-dependencies. And since we do not compute anti-dependencies of these nodes, it is safer to make them kill all the memory they consume. What do you think? Could this be fixed by appending a `MemBarCPUOrderNode` on the slice of src? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28789#discussion_r2619864649 From roland at openjdk.org Mon Dec 15 15:31:25 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 15 Dec 2025 15:31:25 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v5] In-Reply-To: <-lKe3DkagAXc0krDm4tig5ohwSPYu-as9XtpqEDCayM=.ad81edb8-eb78-4579-b626-a78e18e3f69d@github.com> References: <-lKe3DkagAXc0krDm4tig5ohwSPYu-as9XtpqEDCayM=.ad81edb8-eb78-4579-b626-a78e18e3f69d@github.com> Message-ID: On Thu, 27 Nov 2025 12:35:24 GMT, Marc Chevalier wrote: >> This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. >> >> The proposed solution seems somewhat controversial. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go. >> >> # Analysis >> ## Obervationally >> ### IGVN >> During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: >> >> in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) >> in(2): null >> >> We compute the join (HS' meet): >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 >> >> t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> But the current `_type` (of the `PhiNode` as a `TypeNode`) is >> >> _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) >> >> We filter `t` by `_type` >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 >> and we get >> >> ft=java/lang/Object * >> >> which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 >> and >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 >> >> >> ### Verification >> On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time >> >> _type=java/lang/Object * >> >> and so after filtering `t` by (new) `_type` and we get >> >> ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. >> >> ## But why?! >> ### Details on type computation >> In short, we are doing >> >> t = typeof(in(1)) / typeof(in(2)) >> ft = t /\ _type (* IGVN *) >> ft' = t /\ ft (* Verification *) >> >> and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > review otherwise looks reasonable to me. src/hotspot/share/opto/cfgnode.cpp line 1372: > 1370: } > 1371: > 1372: #ifdef ASSERT I think it would be better to move the verification code below out of the way in its own method. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28331#pullrequestreview-3578819167 PR Review Comment: https://git.openjdk.org/jdk/pull/28331#discussion_r2619889895 From duke at openjdk.org Mon Dec 15 15:59:26 2025 From: duke at openjdk.org (Yi Wu) Date: Mon, 15 Dec 2025 15:59:26 GMT Subject: RFR: 8373344: Add support for FP16 min/max reduction operations Message-ID: <7qXqIBuLDFEKPNje6TALxZUATujnOY5hoODC30zJNFM=.07d8d157-1ae5-4751-befd-d6291370fb9c@github.com> This patch adds mid-end support for vectorized min/max reduction operations for half floats. It also includes backend AArch64 support for these operations. Both floating point min/max reductions don?t require strict order, because they are associative. It will generate NEON fminv/fmaxv reduction instructions when max vector length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it will generate the SVE fminv/fmaxv instructions. The patch also adds support for partial min/max reductions on SVE machines using fminv/fmaxv. Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is better than the mainline. Neoverse N1 (UseSVE = 0, max vector length = 16B): Benchmark vectorDim Mode Cnt 8B 16B ReductionMaxFP16 256 thrpt 9 3.69 6.44 ReductionMaxFP16 512 thrpt 9 3.71 7.62 ReductionMaxFP16 1024 thrpt 9 4.16 8.64 ReductionMaxFP16 2048 thrpt 9 4.44 9.12 ReductionMinFP16 256 thrpt 9 3.69 6.43 ReductionMinFP16 512 thrpt 9 3.70 7.62 ReductionMinFP16 1024 thrpt 9 4.16 8.64 ReductionMinFP16 2048 thrpt 9 4.44 9.10 Neoverse V1 (UseSVE = 1, max vector length = 32B): Benchmark vectorDim Mode Cnt 8B 16B 32B ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 Neoverse V2 (UseSVE = 2, max vector length = 16B): Benchmark vectorDim Mode Cnt 8B 16B ReductionMaxFP16 256 thrpt 9 4.78 10.00 ReductionMaxFP16 512 thrpt 9 3.74 11.33 ReductionMaxFP16 1024 thrpt 9 3.86 9.59 ReductionMaxFP16 2048 thrpt 9 3.94 8.71 ReductionMinFP16 256 thrpt 9 4.78 10.00 ReductionMinFP16 512 thrpt 9 3.74 11.29 ReductionMinFP16 1024 thrpt 9 3.86 9.58 ReductionMinFP16 2048 thrpt 9 3.94 8.71 Testing: hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on Neoverse N1/V1/V2. ------------- Commit messages: - 8373344: Add support for FP16 min/max reduction operations Changes: https://git.openjdk.org/jdk/pull/28828/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28828&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373344 Stats: 968 lines in 13 files changed: 296 ins; 22 del; 650 mod Patch: https://git.openjdk.org/jdk/pull/28828.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28828/head:pull/28828 PR: https://git.openjdk.org/jdk/pull/28828 From epeter at openjdk.org Mon Dec 15 16:10:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 16:10:43 GMT Subject: RFR: 8373420: C2: Add true/false_proj*() methods for IfNode as a replacement for proj_out*(true/false) In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 13:13:44 GMT, Christian Hagedorn wrote: > There are a lot of places in the code where we call `proj_out*(true/false)` on an `IfNode`. In some cases, we then cast the returned `ProjNode` back to `IfProjNode` or `IfTrueNode/IfFalseNode`. I often visit such code and now decided to clean this up. > > The patch proposes new `IfNode::true/false_proj*()` methods that return `IfTrueNode/IfFalseNode` directly. I walked through all `proj_out*()` calls and replaced those that used a direct `true/false` or `1/0` as argument. > > There are still more things to clean up in this area, for example, when we return `ProjNode` even though it should be an `IfProjNode` which requires more casting. But let's do that step by step in follow-up clean ups. > > Thanks, > Christian Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28745#pullrequestreview-3579004157 From epeter at openjdk.org Mon Dec 15 16:13:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Dec 2025 16:13:59 GMT Subject: RFR: 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode In-Reply-To: References: Message-ID: <4C-ZywpvjSifXiTb6x9g96IiPGN-63fa-qY5YiZIefQ=.3ccecb19-2bd3-477a-b390-203d0ea6f7f0@github.com> On Fri, 12 Dec 2025 09:48:28 GMT, Christian Hagedorn wrote: > This is a simple clean-up patch which moves `ProjNode::other_if_proj()` to `IfProjNode` and update its uses. It only makes sense to call `other_if_proj()` on actual `IfProjNodes`. > > It also required to update more types from `ProjNode` to `IfProjNode` which is more type-safe and preciser. While touching the methods, I've also added some `const`/`static` where appropriate. > > Thanks, > Christian @chhagedorn Nice cleanup, thanks for working on this :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28785#pullrequestreview-3579020182 From rcastanedalo at openjdk.org Mon Dec 15 16:21:11 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Dec 2025 16:21:11 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v7] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 15:16:20 GMT, Roland Westrelin wrote: > Does testing look ok? Yes, test results look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3656484496 From roland at openjdk.org Mon Dec 15 16:21:11 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 15 Dec 2025 16:21:11 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v7] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 16:15:49 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz thanks for the review. Does testing look ok? > >> Does testing look ok? > > Yes, test results look good. @robcasloz @eme64 thanks for the reviews and testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3656491847 From roland at openjdk.org Mon Dec 15 16:21:13 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 15 Dec 2025 16:21:13 GMT Subject: Integrated: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) In-Reply-To: References: Message-ID: <6HPeY8lPjs_X0AA4EtTZhnt4OeVmp7-_WlY3FlSCGl0=.5b42364a-962c-4554-842d-ab50715ce990@github.com> On Thu, 22 May 2025 08:35:18 GMT, Roland Westrelin wrote: > The test case has an out of loop `Store` with an `AddP` address > expression that has other uses and is in the loop body. Schematically, > only showing the address subgraph and the bases for the `AddP`s: > > > Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 > -> CastPP#110 > > > Both `AddP`s have the same base, a `CastPP` that's also in the loop > body. > > That loop is a counted loop and only has 3 iterations so is fully > unrolled. First, one iteration is peeled: > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > The `AddP`s and `CastPP` are cloned (because in the loop body). As > part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is > called. It finds the test that guards `CastPP#283` in the peeled > iteration dominates and replaces the test that guards `CastPP#110` > (the test in the peeled iteration is the clone of the test in the > loop). That causes `CastPP#110`'s control to be updated to that of the > test in the peeled iteration and to be yanked from the loop. So now > `CastPP#283` and `CastPP#110` have the same inputs. > > Next unrolling happens: > > > /-> CastPP#110 > /-> AddP#400 -> AddP#401 -> CastPP#110 > Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 > \ -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > `AddP`s are cloned once more but not the `CastPP`s because they are > both in the peeled iteration now. A new `Phi` is added. > > Next igvn runs. It's going to push the `AddP`s through the `Phi`s. > > Through `Phi#477`: > > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 > \ -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > Through `Phi#360`: > > > /-> AddP#134 -> CastPP#110 > /-> Phi#509 -> AddP#401 -> CastPP#110 > Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 > -> Phi#514 -> CastPP#283 > -> CastP#110 > > > Then `Phi#514` which has 2 `CastPP`s as input with identical inputs is > transformed into anot... This pull request has now been integrated. Changeset: ad29642d Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/ad29642d8f4e8e0fb1223b14b85ab7841d7b1b51 Stats: 159 lines in 10 files changed: 148 ins; 3 del; 8 mod 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) Reviewed-by: rcastanedalo, epeter ------------- PR: https://git.openjdk.org/jdk/pull/25386 From roland at openjdk.org Mon Dec 15 16:29:46 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 15 Dec 2025 16:29:46 GMT Subject: RFR: 8373633: C2: Use interface receiver type to improve CHA decisions In-Reply-To: References: Message-ID: <4Yf-QKYfX3oICfh5VbDj7h5C3GKkExeu8llKaLWx4L8=.ccd07642-5d3b-4c02-96a4-a0495f041482@github.com> On Sat, 13 Dec 2025 02:01:10 GMT, Vladimir Ivanov wrote: > Strength-reducing an interface call to a virtual call for interfaces with > unique implementors can use receiver type information to narrow the context. > > C2 tracks interface types and receiver type information can be used to reveal > an interface with a unique implementor which can't be derived from the call > site itself. > > Since C2 effectively accumulates a union interface type from multiple subtype checks, iterating over individual components of a type may reveal a candidate for a strength-reduction. The only prerequisite is that a candidate has to be a subtype of the declared interface. > > Testing: hs-tier1 - hs-tier5 src/hotspot/share/opto/callGenerator.cpp line 529: > 527: allow_inline, > 528: _prof_factor, > 529: nullptr /*receiver_type*/, Is there no benefit to passing `receiver_type` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28811#discussion_r2620062742 From liach at openjdk.org Mon Dec 15 17:10:24 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 15 Dec 2025 17:10:24 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v10] In-Reply-To: References: Message-ID: > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache - Stage - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache - Review - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache - Bugs and verify loader leak - Try to avoid loader leak - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache - Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure - Test from Jorn - ... and 10 more: https://git.openjdk.org/jdk/compare/f5945cc1...f9d808c1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/d734e8a6..f9d808c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=08-09 Stats: 19890 lines in 378 files changed: 12699 ins; 4700 del; 2491 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From galder at openjdk.org Mon Dec 15 17:21:17 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 15 Dec 2025 17:21:17 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations [v3] In-Reply-To: References: Message-ID: > `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. > > The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. > > Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. > > **Update 15.12.25**: `PhaseIterGVN::verify_Ideal_for` exceptions for MinI/MaxI are still needed. > > ~If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?):~ > > > // MinINode::Ideal > // Did not investigate, but there are some patterns that might > // need more notification. > case Op_MinI: > case Op_MaxI: // preemptively removed it as well. > return false; > > > I've run tier1-3 tests on linux/x64 and they passed. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Revert "Remove MinI/MaxI exceptions from verify_Ideal_for" This reverts commit 1ae308155ebec12a9741eb40b1630dbde49af7ac. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28770/files - new: https://git.openjdk.org/jdk/pull/28770/files/1ae30815..630364e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28770&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28770&range=01-02 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28770/head:pull/28770 PR: https://git.openjdk.org/jdk/pull/28770 From galder at openjdk.org Mon Dec 15 17:21:18 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 15 Dec 2025 17:21:18 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 17:14:31 GMT, Emanuel Peter wrote: > I don't remember. Try enabling the verification, and see if you find any test that fails. If not: great, maybe you fixed it! If it still fails, it would be nice if you added more info, but not neccessary. It still fails with other optimisation missing, so I will revert that the commit that uncommented that: ----------messages:(7/385)---------- command: main -Xcomp -XX:VerifyIterativeGVN=1111 compiler.c2.TestVerifyIterativeGVN reason: User specified action: run main/othervm/timeout=300 -Xcomp -XX:VerifyIterativeGVN=1111 compiler.c2.TestVerifyIterativeGVN started: Mon Dec 15 09:07:06.937 CET 2025 Mode: othervm [/othervm specified] Process id: 133090 finished: Mon Dec 15 09:07:45.303 CET 2025 elapsed time (seconds): 38.366 ----------configuration:(0/0)---------- ----------System.out:(32/1990)---------- Missed Ideal optimization (can_reshape=false): The node was reshaped by Ideal. The result after Ideal: dist dump --------------------------------------------- 1 1612 AddI === _ 1880 668 [[ 1258 ]] !orig=[3471] !jvms: DirectMethodHandle::makePreparedLambdaForm @ bci:612 (line 293) 1 1254 CastII === 1519 1606 [[ 881 1258 1879 1607 870 2028 ]] #int:0..maxint-1, widen: 3 !orig=[5228],[3463] !jvms: DirectMethodHandle::makePreparedLambdaForm @ bci:612 (line 293) 0 1258 MinI === _ 1254 1612 [[ 871 ]] !jvms: DirectMethodHandle::makePreparedLambdaForm @ bci:612 (line 293) ``` ------------- PR Comment: https://git.openjdk.org/jdk/pull/28770#issuecomment-3656756730 From galder at openjdk.org Mon Dec 15 17:21:20 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 15 Dec 2025 17:21:20 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations [v2] In-Reply-To: References: Message-ID: <-y-xKVReaDrTTgpbWyyzE5wzWn96ePrdeGSWQmnlQck=.7f8c5a65-3dd2-499a-a9e6-30dceb3d0c35@github.com> On Mon, 15 Dec 2025 08:03:47 GMT, Galder Zamarre?o wrote: >> `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. >> >> The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. >> >> Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. >> >> **Update 15.12.25**: `PhaseIterGVN::verify_Ideal_for` exceptions for MinI/MaxI are still needed. >> >> ~If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?):~ >> >> >> // MinINode::Ideal >> // Did not investigate, but there are some patterns that might >> // need more notification. >> case Op_MinI: >> case Op_MaxI: // preemptively removed it as well. >> return false; >> >> >> I've run tier1-3 tests on linux/x64 and they passed. > > Galder Zamarre?o has updated the pull request incrementally with six additional commits since the last revision: > > - Remove MinI/MaxI exceptions from verify_Ideal_for > - Iterate over enum values instead > - Refactor to MaxNode::IdealI > - Remove variables > - Use ${test.main.class} > - Move to compiler.igvn package I've reverted the `PhaseIterGVN::verify_Ideal_for` changes suggested in the description. Tier1-3 testing looks good otherwise. @TobiHartmann @eme64 Could you please review it once more? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28770#issuecomment-3656764947 From jbhateja at openjdk.org Mon Dec 15 17:27:12 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 15 Dec 2025 17:27:12 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions Message-ID: Emulate multiplier using LEA addressing scheme, where effective address = BASE + INDEX * SCALE + OFFSET Refer to section "3.5.1.2 Using LEA" of Intel's optimization manual for details reagarding slow vs fast lea instructions. Given that latency of IMUL with register operands is 3 cycles, a combination of two fast LEAs each with 1 cycle latency to emulate multipler is performant. Consider X as the multiplicand, by variying the scale of first LEA instruction we can generate 4 input i.e. X + X * 1 = 2X X + X * 2 = 3X X + X * 4 = 5X X + X * 8 = 9X Following table list downs various multiplier combinations for output of first LEA at BASE and/or INDEX by varying the scale of second fast LEA instruction. We will only handle the cases which cannot be handled by just shift + add. BASE INDEX SCALE MULTIPLER X X 1 2 (Terminal) X X 2 3 (Terminal) X X 4 5 (Terminal) X X 8 9 (Terminal) 3X 3X 1 6 X 3X 2 7 5X 5X 1 10 X 5X 2 11 X 3X 4 13 5X 5X 2 15 X 2X 8 17 9X 9X 1 18 X 9X 2 19 X 5X 4 21 5X 5X 4 25 9X 9X 2 27 X 9X 4 37 X 5X 8 41 9X 9X 4 45 X 9X 8 73 9X 9X 8 81 All the non-unity inputs tied to BASE / INDEX are derived out of terminal cases which represent first FAST LEA. Thus, all the multipliers can be computed using just two LEA instructions. Performance numbers for new micro benchmark included with this patch shows around **5-50% improvments** on latest x86 servers. System: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz Emerald Rapids:- Baseline:- Benchmark Mode Cnt Score Error Units ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 189.690 ops/min ConstantMultiplierOptimization.testConstMultiplierL thrpt 2 196.283 ops/min Withopt:- Benchmark Mode Cnt Score Error Units ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 283.827 ops/min ConstantMultiplierOptimization.testConstMultiplierL thrpt 2 283.578 ops/min System: AMD EPYC 9B45 Turin:- Baseline:- Benchmark Mode Cnt Score Error Units ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 393.299 ops/min ConstantMultiplierOptimization.testConstMultiplierL thrpt 2 393.764 ops/min Withopt:- Benchmark Mode Cnt Score Error Units ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 409.790 ops/min ConstantMultiplierOptimization.testConstMultiplierL thrpt 2 409.756 ops/min Effect of optimization is more pronounced on Intel server in comparison to AMD's, As per Agner Fogs' instruction latency manual IMUL instruciton has resiprocal througput of 1 while Fast LEA has reciprocal througput of 0.25 on Zen4 and around 0.5 on Intel Sunnycove (Icelake), going by that we can executue more LEA in parallel in one cycle, it will need to be investigated seperately why we don't see similar gains on AMD target. Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - Adding IR framework tests - Adding benchmark - 8373480: Optimize constant input multiplication using LEA instructions Changes: https://git.openjdk.org/jdk/pull/28759/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28759&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373480 Stats: 535 lines in 8 files changed: 525 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/28759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28759/head:pull/28759 PR: https://git.openjdk.org/jdk/pull/28759 From liach at openjdk.org Mon Dec 15 17:44:49 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 15 Dec 2025 17:44:49 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v11] In-Reply-To: References: Message-ID: <4UBpAWbg2PlNeJ9-Wr4AvehrdVb8C8BYdoXfptvPI1o=.b92c5a06-6da9-4fe0-ab02-47c5a2126020@github.com> > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Missed IR test review, rearrange benches ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/f9d808c1..1d5461db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=09-10 Stats: 109 lines in 4 files changed: 89 ins; 11 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From liach at openjdk.org Mon Dec 15 17:44:52 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 15 Dec 2025 17:44:52 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v10] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 17:10:24 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Stage > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Review > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Bugs and verify loader leak > - Try to avoid loader leak > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure > - Test from Jorn > - ... and 10 more: https://git.openjdk.org/jdk/compare/faea2335...f9d808c1 Since I just noticed the actual cause of the failure of caching is that AD is created per name+type combination, I have created a benchmark case for that instead of reusing the existing exact ones: Benchmark Mode Cnt Score Error Units VarHandleTypeMismatch.exactInvocation avgt 30 0.396 ? 0.009 ns/op VarHandleTypeMismatch.genericInvocation avgt 30 0.375 ? 0.009 ns/op VarHandleTypeMismatch.pollutedInvocation avgt 30 8.281 ? 0.222 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/28585#issuecomment-3656852491 From liach at openjdk.org Mon Dec 15 17:44:54 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 15 Dec 2025 17:44:54 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v9] In-Reply-To: References: <4YumBpbA2k8DC13H1s808_5OJx-1FMxD9CbIUfRTb8Q=.742f90c9-0d93-43b7-abe7-76422a0c8359@github.com> Message-ID: On Wed, 10 Dec 2025 22:13:23 GMT, Vladimir Ivanov wrote: >> I think that is when two different VarHandles are both invoked non-exactly in two call sites in one method, the 2nd one fails to be inlined, that the compare-and-exchange from the 2nd one is not present in the final IR. The deoptimization reason is either "unstable-if" or "too many null checks", I think I will try look into it in another effort. > > If it's a test problem, then it's better to comment out the problematic test case instead. I have diagnosed the reason, updated the comments, and added a benchmark to showcase this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2620294345 From liach at openjdk.org Mon Dec 15 19:49:52 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 15 Dec 2025 19:49:52 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v4] In-Reply-To: References: Message-ID: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Move test, fix merge garbage - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const - Typo - assert - refactorings - Typo - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const - Cleanup - identity hash support in C2 - ... and 2 more: https://git.openjdk.org/jdk/compare/9543b2fa...67a3954f ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28589/files - new: https://git.openjdk.org/jdk/pull/28589/files/b1d8be39..67a3954f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=02-03 Stats: 44910 lines in 756 files changed: 29623 ins; 11706 del; 3581 mod Patch: https://git.openjdk.org/jdk/pull/28589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28589/head:pull/28589 PR: https://git.openjdk.org/jdk/pull/28589 From liach at openjdk.org Mon Dec 15 19:49:54 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 15 Dec 2025 19:49:54 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v3] In-Reply-To: References: <6ip4JrJ4WiYEe6d2FA_WQ5dDjxAk2RPaPbwth4jNeJM=.43d7879d-89a4-434c-80ea-371c92581686@github.com> <0b81mH1_Y6r905N2HmehXBbSFdzLpJIfuXHNfijpHBs=.c870b13e-a52f-4c00-b771-91cf9205cb4a@github.com> Message-ID: On Sat, 13 Dec 2025 00:57:52 GMT, Vladimir Ivanov wrote: >> You can always do more than just C2 IR verification. For example, we could also do result verification. That would give us coverage for C1 for example. I think it is just good practice not to have a restriction if it is not absolutely necessary. > > I don't argue that there's always a chance to catch a bug, but unit tests on C2 IR are mostly trivial, so the actual chance to spot a unique problem is quite low. And the price is execution time. I kept the C2 limit (note this is a build restriction instead of a flag restriction), but updated to use test.main.class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2620621537 From liach at openjdk.org Mon Dec 15 19:49:55 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 15 Dec 2025 19:49:55 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v3] In-Reply-To: <6ip4JrJ4WiYEe6d2FA_WQ5dDjxAk2RPaPbwth4jNeJM=.43d7879d-89a4-434c-80ea-371c92581686@github.com> References: <6ip4JrJ4WiYEe6d2FA_WQ5dDjxAk2RPaPbwth4jNeJM=.43d7879d-89a4-434c-80ea-371c92581686@github.com> Message-ID: <7oLQjFLs3t_6kGwXXUNec_Kyvbm-CDu-qX7if_UDfy8=.c613c4d0-6df2-49a4-bfa1-807e10c07147@github.com> On Wed, 10 Dec 2025 17:17:13 GMT, Emanuel Peter wrote: >> I can't find a way to access the identity hash code without compilation. Would something like a method that calls System.identityHashCode but is not inlied work? > > You could compute the result in the static initializer, it should therefore be computed in the interpreter. And then add a `@Check` method to compare the `testSum` value from the compiler. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2620621850 From pchilanomate at openjdk.org Mon Dec 15 22:24:02 2025 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 15 Dec 2025 22:24:02 GMT Subject: RFR: 8373630: r18_tls should not be modified on Windows AArch64 [v2] In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 05:09:33 GMT, Saint Wesonga wrote: >> On Windows, r18_tls is used to store the pointer to the current thread's TEB. Therefore, this register should never be modified (see details in [register_aarch64.hpp](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/register_aarch64.hpp#L118-L128)). One scenario that results in the modification of r18_tls involves virtual threads on Windows. Frames are frozen by [Continuation::try_preempt](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuation.cpp#L131) on one carrier thread whose registers are saved. When the frame is thawed, execution can continue on a different carrier thread. When this happens, [rthread (x28) is fixed to point to the new carrier thread](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L2670). The continuation then results in [restore_live_registers](https://github.com/o penjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp#L313) restoring all the saved registers (including the fixed rthread register). However, this also restores x18, which was the TEB pointer for the previous carrier thread, causing the new carrier thread to execute with the TLS of the previous carrier thread. This causes hangs and occasional crashes in the virtual threads jtreg tests on Windows AArch64 that are resolved by this fix. > > Saint Wesonga has updated the pull request incrementally with one additional commit since the last revision: > > Do not modify r18_tls if R18_RESERVED is defined Looks good to me. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28808#pullrequestreview-3580420537 From erfang at openjdk.org Tue Dec 16 02:24:51 2025 From: erfang at openjdk.org (Eric Fang) Date: Tue, 16 Dec 2025 02:24:51 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v6] In-Reply-To: References: Message-ID: <_Nidz7v88pMoPXhyrFT7ZzLrxBJWj5o_i75enBlU8Us=.0dc8fd05-a366-4e6e-8af7-0ccd0d5ddf50@github.com> On Mon, 15 Dec 2025 07:37:42 GMT, Jatin Bhateja wrote: >> My main concern here is that the requirement for `VectorMaskCastNode` to have the same length for both input and output might have been removed in the future. I'm not sure, but we do require the lengths to be the same here, so I added this assertion. @eme64 has a similar comment; see https://github.com/openjdk/jdk/pull/28313/changes#r2614577536. So, if you all think that the requirement for lane length in `VectorMaskCastNode` won't be removed, then we can delete this assertion and the condition below. > > I think assertion here is redundant. Yeah, I think deleting this assertion is also reasonable. After all, if the input and output lengths of `VectorMaskCastNode` are inconsistent, it will cause more problems. I will consider how to handle these two places in conjunction with @eme64's comment. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2621517632 From qamai at openjdk.org Tue Dec 16 02:31:57 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Dec 2025 02:31:57 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v4] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 19:49:52 GMT, Chen Liang wrote: >> Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Move test, fix merge garbage > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const > - Typo > - assert > - refactorings > - Typo > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const > - Cleanup > - identity hash support in C2 > - ... and 2 more: https://git.openjdk.org/jdk/compare/c026ba9b...67a3954f src/hotspot/share/opto/library_call.cpp line 4806: > 4804: assert(!is_virtual, "no devirtualization for constant receiver?"); > 4805: ciConstant identity_hash = t->const_oop()->identity_hash(); > 4806: if (identity_hash.is_valid()) { Is there any reason we don't calculate the identity hash right away if there is not any? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2621527376 From liach at openjdk.org Tue Dec 16 02:42:59 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 16 Dec 2025 02:42:59 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v4] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 02:29:05 GMT, Quan Anh Mai wrote: >> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: >> >> - Move test, fix merge garbage >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const >> - Typo >> - assert >> - refactorings >> - Typo >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const >> - Cleanup >> - identity hash support in C2 >> - ... and 2 more: https://git.openjdk.org/jdk/compare/cea24102...67a3954f > > src/hotspot/share/opto/library_call.cpp line 4806: > >> 4804: assert(!is_virtual, "no devirtualization for constant receiver?"); >> 4805: ciConstant identity_hash = t->const_oop()->identity_hash(); >> 4806: if (identity_hash.is_valid()) { > > Is there any reason we don't calculate the identity hash right away if there is not any? @iwanowww recommended not to so that we can save resources - ideally we should convert this to sort of macro node, so we can calculate the hash if the node is not eliminated in the end. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2621544616 From vlivanov at openjdk.org Tue Dec 16 02:53:53 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 16 Dec 2025 02:53:53 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v4] In-Reply-To: References: Message-ID: <4nezat7vgHclje5N0lTXYIW49lgU1c9Fb_PXiUhffmE=.2f83013e-994e-4cec-b1ae-43e37c0b3455@github.com> On Tue, 16 Dec 2025 02:39:47 GMT, Chen Liang wrote: >> src/hotspot/share/opto/library_call.cpp line 4806: >> >>> 4804: assert(!is_virtual, "no devirtualization for constant receiver?"); >>> 4805: ciConstant identity_hash = t->const_oop()->identity_hash(); >>> 4806: if (identity_hash.is_valid()) { >> >> Is there any reason we don't calculate the identity hash right away if there is not any? > > @iwanowww recommended not to so that we can save resources - ideally we should convert this to sort of macro node, so we can calculate the hash if the node is not eliminated in the end. My main concern is possible interference with application, not performance. One example is CDS where archive dumping is performed in a single thread with a fixed random generator seed. If identity hash computation can be triggered from JIT-compiler thread (with a different seed), it will break deterministic behavior. Another case to illustrate another type of issues is biased locking: the optimization was disabled for objects with identity hash code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28589#discussion_r2621562185 From qamai at openjdk.org Tue Dec 16 04:47:42 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Dec 2025 04:47:42 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v10] In-Reply-To: References: Message-ID: > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. > > I do not see a noticeable difference in C2 runtime with and without this patch. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Much more comments, refactor the data into a separate class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/31d96537..0eb6e9fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=08-09 Stats: 354 lines in 2 files changed: 208 ins; 138 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From qamai at openjdk.org Tue Dec 16 05:28:56 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Dec 2025 05:28:56 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v7] In-Reply-To: References: Message-ID: On Sun, 7 Dec 2025 12:08:18 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Add assertion for the helper in CTPComparator > > Co-authored-by: Emanuel Peter > - remove std::hash > - remove unordered_map, add some comments for all_instances_size > - Emanuel's reviews > - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences May I have a second review for this PR, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27618#issuecomment-3658904887 From qamai at openjdk.org Tue Dec 16 06:23:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Dec 2025 06:23:14 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v3] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 15:23:23 GMT, Roland Westrelin wrote: >> It is because if a node consumes more memory than it produces, we need to compute its anti-dependencies. And since we do not compute anti-dependencies of these nodes, it is safer to make them kill all the memory they consume. What do you think? > > Could this be fixed by appending a `MemBarCPUOrderNode` on the slice of src? That's a really great idea! I have implemented it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28789#discussion_r2621940305 From qamai at openjdk.org Tue Dec 16 06:23:12 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Dec 2025 06:23:12 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v3] In-Reply-To: References: Message-ID: > Hi, > > This is extracted from #28570 , there are 2 issues here: > > - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. > - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. > > Please kindly review, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Use MemBar instead of widening the intrinsic memory ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28789/files - new: https://git.openjdk.org/jdk/pull/28789/files/1e026354..9649a2f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28789&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28789&range=01-02 Stats: 62 lines in 3 files changed: 39 ins; 8 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/28789.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28789/head:pull/28789 PR: https://git.openjdk.org/jdk/pull/28789 From hgreule at openjdk.org Tue Dec 16 07:12:59 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 16 Dec 2025 07:12:59 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v7] In-Reply-To: References: Message-ID: On Sun, 7 Dec 2025 12:08:18 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Add assertion for the helper in CTPComparator > > Co-authored-by: Emanuel Peter > - remove std::hash > - remove unordered_map, add some comments for all_instances_size > - Emanuel's reviews > - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences Not sure if my review counts but I went through the changes again and it looks like all existing inference logic is covered by the more concise and also more powerful new logic. ------------- Marked as reviewed by hgreule (Committer). PR Review: https://git.openjdk.org/jdk/pull/27618#pullrequestreview-3581649461 From epeter at openjdk.org Tue Dec 16 07:42:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 07:42:15 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations [v3] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 17:21:17 GMT, Galder Zamarre?o wrote: >> `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. >> >> The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. >> >> Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. >> >> **Update 15.12.25**: `PhaseIterGVN::verify_Ideal_for` exceptions for MinI/MaxI are still needed. >> >> ~If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?):~ >> >> >> // MinINode::Ideal >> // Did not investigate, but there are some patterns that might >> // need more notification. >> case Op_MinI: >> case Op_MaxI: // preemptively removed it as well. >> return false; >> >> >> I've run tier1-3 tests on linux/x64 and they passed. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Remove MinI/MaxI exceptions from verify_Ideal_for" > > This reverts commit 1ae308155ebec12a9741eb40b1630dbde49af7ac. Looks good now, except that little nit below. We can run some internal testing once a second reviewer has had a look :) test/hotspot/jtreg/compiler/igvn/TestMinMaxIdeal.java line 101: > 99: let("irNodeName", op.name()), > 100: let("boxedTypeName", op.type.boxedTypeName()), > 101: let("op", op.name()), Nit: you are repeating the same value `op.name()` with two hashtags `op` and `irNodeName`. Is that intentional? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28770#pullrequestreview-3581735166 PR Review Comment: https://git.openjdk.org/jdk/pull/28770#discussion_r2622145391 From galder at openjdk.org Tue Dec 16 07:53:53 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 16 Dec 2025 07:53:53 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations [v3] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 07:37:45 GMT, Emanuel Peter wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert "Remove MinI/MaxI exceptions from verify_Ideal_for" >> >> This reverts commit 1ae308155ebec12a9741eb40b1630dbde49af7ac. > > test/hotspot/jtreg/compiler/igvn/TestMinMaxIdeal.java line 101: > >> 99: let("irNodeName", op.name()), >> 100: let("boxedTypeName", op.type.boxedTypeName()), >> 101: let("op", op.name()), > > Nit: you are repeating the same value `op.name()` with two hashtags `op` and `irNodeName`. Is that intentional? No, I'll get it fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28770#discussion_r2622182373 From galder at openjdk.org Tue Dec 16 08:07:16 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 16 Dec 2025 08:07:16 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations [v4] In-Reply-To: References: Message-ID: > `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. > > The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. > > Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. > > **Update 15.12.25**: `PhaseIterGVN::verify_Ideal_for` exceptions for MinI/MaxI are still needed. > > ~If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?):~ > > > // MinINode::Ideal > // Did not investigate, but there are some patterns that might > // need more notification. > case Op_MinI: > case Op_MaxI: // preemptively removed it as well. > return false; > > > I've run tier1-3 tests on linux/x64 and they passed. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Fix redundant variable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28770/files - new: https://git.openjdk.org/jdk/pull/28770/files/630364e1..b1ab81bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28770&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28770&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28770/head:pull/28770 PR: https://git.openjdk.org/jdk/pull/28770 From epeter at openjdk.org Tue Dec 16 08:07:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 08:07:16 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations [v4] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 08:03:59 GMT, Galder Zamarre?o wrote: >> `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. >> >> The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. >> >> Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. >> >> **Update 15.12.25**: `PhaseIterGVN::verify_Ideal_for` exceptions for MinI/MaxI are still needed. >> >> ~If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?):~ >> >> >> // MinINode::Ideal >> // Did not investigate, but there are some patterns that might >> // need more notification. >> case Op_MinI: >> case Op_MaxI: // preemptively removed it as well. >> return false; >> >> >> I've run tier1-3 tests on linux/x64 and they passed. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Fix redundant variable Nice, looks cleaner now :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28770#pullrequestreview-3581795895 From roland at openjdk.org Tue Dec 16 08:07:17 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 16 Dec 2025 08:07:17 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations [v4] In-Reply-To: References: Message-ID: <5AXdowd0eyM0Pkj_HXjixrRbXn9rNK9raVK3JWBsqF0=.8cf41106-0aaf-4bd7-b789-606b1b6a0e38@github.com> On Tue, 16 Dec 2025 08:03:59 GMT, Galder Zamarre?o wrote: >> `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. >> >> The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. >> >> Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. >> >> **Update 15.12.25**: `PhaseIterGVN::verify_Ideal_for` exceptions for MinI/MaxI are still needed. >> >> ~If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?):~ >> >> >> // MinINode::Ideal >> // Did not investigate, but there are some patterns that might >> // need more notification. >> case Op_MinI: >> case Op_MaxI: // preemptively removed it as well. >> return false; >> >> >> I've run tier1-3 tests on linux/x64 and they passed. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Fix redundant variable Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28770#pullrequestreview-3581810553 From galder at openjdk.org Tue Dec 16 08:07:18 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 16 Dec 2025 08:07:18 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations [v3] In-Reply-To: References: Message-ID: <6b76KryMssQjzpYevQAp7HYdraB3iYnUa-nPLYoIPZA=.cc50a9a3-aae1-4e27-a657-2074ad058a5d@github.com> On Tue, 16 Dec 2025 07:51:12 GMT, Galder Zamarre?o wrote: >> test/hotspot/jtreg/compiler/igvn/TestMinMaxIdeal.java line 101: >> >>> 99: let("irNodeName", op.name()), >>> 100: let("boxedTypeName", op.type.boxedTypeName()), >>> 101: let("op", op.name()), >> >> Nit: you are repeating the same value `op.name()` with two hashtags `op` and `irNodeName`. Is that intentional? > > No, I'll get it fixed Just pushed a fix ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28770#discussion_r2622190587 From bmaillard at openjdk.org Tue Dec 16 08:20:01 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 16 Dec 2025 08:20:01 GMT Subject: RFR: 8373355: C2: CompileCommand PrintIdealPhase should also print nodes that are not "reachable from below" [v2] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 13:53:00 GMT, Emanuel Peter wrote: >> PrintIdealPhase seems to only traverse the graph "from below". So if there are any nodes that are not reachable "from below" but only reachable "from above", then they do not show up in the printing. >> That is problematic: out TestFramework relies on that output, especially if you have some IR rule with failOn or count = 0 . The node would be in the graph, but the test tells you there is none. >> >> Before the fix, we used to see this: >> The LoadI and StoreI do NOT show up during ITER_GVN1. >> This is because at that point, the infinite loop has no exit. So the loop is not connected down to Root. >> But once we do a first loop opts round, we see the LoadI and StoreI in PHASEIDEALLOOP1. >> This is because we insert a NeverBranch, which serves as an artificial exit, that connects the loop down to Root. >> >> Fix: travese both up `+` and down `-`. >> >> --------------------------------- >> >> Note: I had to update some tests from https://github.com/openjdk/jdk/pull/22786. >> This is because during IGVN the constant nodes that lose all outputs are not removed. So now they show up in the IR graph, and the IR rules failed. I adjusted the IR rules to work with the operations `ModF/ModD` rather than constants `ConF/ConD`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRFindFromAbove.java > > Co-authored-by: Roberto Casta?eda Lozano Looks good to me, thanks for fixing :) ------------- Marked as reviewed by bmaillard (Committer). PR Review: https://git.openjdk.org/jdk/pull/28762#pullrequestreview-3581865492 From bmaillard at openjdk.org Tue Dec 16 08:27:11 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Tue, 16 Dec 2025 08:27:11 GMT Subject: [jdk26] RFR: 8373579: Problem list compiler/runtime/Test7196199.java Message-ID: Hi all, This pull request contains a backport of commit [a05d5d25](https://github.com/openjdk/jdk/commit/a05d5d2514c835f2bfeaf7a8c7df0ac241f0177f) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Beno?t Maillard on 12 Dec 2025 and was reviewed by Christian Hagedorn and Emanuel Peter. Thanks! ------------- Commit messages: - Backport a05d5d2514c835f2bfeaf7a8c7df0ac241f0177f Changes: https://git.openjdk.org/jdk/pull/28798/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28798&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373579 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28798.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28798/head:pull/28798 PR: https://git.openjdk.org/jdk/pull/28798 From mchevalier at openjdk.org Tue Dec 16 08:29:57 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 16 Dec 2025 08:29:57 GMT Subject: RFR: 8373355: C2: CompileCommand PrintIdealPhase should also print nodes that are not "reachable from below" [v2] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 13:53:00 GMT, Emanuel Peter wrote: >> PrintIdealPhase seems to only traverse the graph "from below". So if there are any nodes that are not reachable "from below" but only reachable "from above", then they do not show up in the printing. >> That is problematic: out TestFramework relies on that output, especially if you have some IR rule with failOn or count = 0 . The node would be in the graph, but the test tells you there is none. >> >> Before the fix, we used to see this: >> The LoadI and StoreI do NOT show up during ITER_GVN1. >> This is because at that point, the infinite loop has no exit. So the loop is not connected down to Root. >> But once we do a first loop opts round, we see the LoadI and StoreI in PHASEIDEALLOOP1. >> This is because we insert a NeverBranch, which serves as an artificial exit, that connects the loop down to Root. >> >> Fix: travese both up `+` and down `-`. >> >> --------------------------------- >> >> Note: I had to update some tests from https://github.com/openjdk/jdk/pull/22786. >> This is because during IGVN the constant nodes that lose all outputs are not removed. So now they show up in the IR graph, and the IR rules failed. I adjusted the IR rules to work with the operations `ModF/ModD` rather than constants `ConF/ConD`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRFindFromAbove.java > > Co-authored-by: Roberto Casta?eda Lozano Very fine! test/hotspot/jtreg/compiler/c2/irTests/ModDNodeTests.java line 65: > 63: } > 64: > 65: // Note: we used to check for ConF nodes in the IR. But that is a bit brittle: `ConD` in this case. ------------- Marked as reviewed by mchevalier (Committer). PR Review: https://git.openjdk.org/jdk/pull/28762#pullrequestreview-3581895525 PR Review Comment: https://git.openjdk.org/jdk/pull/28762#discussion_r2622284669 From epeter at openjdk.org Tue Dec 16 08:52:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 08:52:45 GMT Subject: RFR: 8373355: C2: CompileCommand PrintIdealPhase should also print nodes that are not "reachable from below" [v3] In-Reply-To: References: Message-ID: > PrintIdealPhase seems to only traverse the graph "from below". So if there are any nodes that are not reachable "from below" but only reachable "from above", then they do not show up in the printing. > That is problematic: out TestFramework relies on that output, especially if you have some IR rule with failOn or count = 0 . The node would be in the graph, but the test tells you there is none. > > Before the fix, we used to see this: > The LoadI and StoreI do NOT show up during ITER_GVN1. > This is because at that point, the infinite loop has no exit. So the loop is not connected down to Root. > But once we do a first loop opts round, we see the LoadI and StoreI in PHASEIDEALLOOP1. > This is because we insert a NeverBranch, which serves as an artificial exit, that connects the loop down to Root. > > Fix: travese both up `+` and down `-`. > > --------------------------------- > > Note: I had to update some tests from https://github.com/openjdk/jdk/pull/22786. > This is because during IGVN the constant nodes that lose all outputs are not removed. So now they show up in the IR graph, and the IR rules failed. I adjusted the IR rules to work with the operations `ModF/ModD` rather than constants `ConF/ConD`. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Suggestion by Marc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28762/files - new: https://git.openjdk.org/jdk/pull/28762/files/769640f2..42ce8ed0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28762&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28762&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28762.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28762/head:pull/28762 PR: https://git.openjdk.org/jdk/pull/28762 From epeter at openjdk.org Tue Dec 16 08:52:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 08:52:49 GMT Subject: RFR: 8373355: C2: CompileCommand PrintIdealPhase should also print nodes that are not "reachable from below" [v2] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 08:27:20 GMT, Marc Chevalier wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRFindFromAbove.java >> >> Co-authored-by: Roberto Casta?eda Lozano > > Very fine! @marc-chevalier @benoitmaillard @robcasloz Thanks for the reviews! I'll need another approval because of the minor fix so I can integrate ;) > test/hotspot/jtreg/compiler/c2/irTests/ModDNodeTests.java line 65: > >> 63: } >> 64: >> 65: // Note: we used to check for ConF nodes in the IR. But that is a bit brittle: > > `ConD` in this case. Nice catch! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28762#issuecomment-3659443105 PR Review Comment: https://git.openjdk.org/jdk/pull/28762#discussion_r2622338953 From epeter at openjdk.org Tue Dec 16 08:52:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 08:52:51 GMT Subject: RFR: 8373355: C2: CompileCommand PrintIdealPhase should also print nodes that are not "reachable from below" [v2] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 13:53:00 GMT, Emanuel Peter wrote: >> PrintIdealPhase seems to only traverse the graph "from below". So if there are any nodes that are not reachable "from below" but only reachable "from above", then they do not show up in the printing. >> That is problematic: out TestFramework relies on that output, especially if you have some IR rule with failOn or count = 0 . The node would be in the graph, but the test tells you there is none. >> >> Before the fix, we used to see this: >> The LoadI and StoreI do NOT show up during ITER_GVN1. >> This is because at that point, the infinite loop has no exit. So the loop is not connected down to Root. >> But once we do a first loop opts round, we see the LoadI and StoreI in PHASEIDEALLOOP1. >> This is because we insert a NeverBranch, which serves as an artificial exit, that connects the loop down to Root. >> >> Fix: travese both up `+` and down `-`. >> >> --------------------------------- >> >> Note: I had to update some tests from https://github.com/openjdk/jdk/pull/22786. >> This is because during IGVN the constant nodes that lose all outputs are not removed. So now they show up in the IR graph, and the IR rules failed. I adjusted the IR rules to work with the operations `ModF/ModD` rather than constants `ConF/ConD`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRFindFromAbove.java > > Co-authored-by: Roberto Casta?eda Lozano test/hotspot/jtreg/compiler/c2/irTests/ModDNodeTests.java line 65: > 63: } > 64: > 65: // Note: we used to check for ConF nodes in the IR. But that is a bit brittle: Suggestion: // Note: we used to check for ConD nodes in the IR. But that is a bit brittle: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28762#discussion_r2622339089 From mchevalier at openjdk.org Tue Dec 16 08:57:55 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 16 Dec 2025 08:57:55 GMT Subject: RFR: 8373355: C2: CompileCommand PrintIdealPhase should also print nodes that are not "reachable from below" [v3] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 08:52:45 GMT, Emanuel Peter wrote: >> PrintIdealPhase seems to only traverse the graph "from below". So if there are any nodes that are not reachable "from below" but only reachable "from above", then they do not show up in the printing. >> That is problematic: out TestFramework relies on that output, especially if you have some IR rule with failOn or count = 0 . The node would be in the graph, but the test tells you there is none. >> >> Before the fix, we used to see this: >> The LoadI and StoreI do NOT show up during ITER_GVN1. >> This is because at that point, the infinite loop has no exit. So the loop is not connected down to Root. >> But once we do a first loop opts round, we see the LoadI and StoreI in PHASEIDEALLOOP1. >> This is because we insert a NeverBranch, which serves as an artificial exit, that connects the loop down to Root. >> >> Fix: travese both up `+` and down `-`. >> >> --------------------------------- >> >> Note: I had to update some tests from https://github.com/openjdk/jdk/pull/22786. >> This is because during IGVN the constant nodes that lose all outputs are not removed. So now they show up in the IR graph, and the IR rules failed. I adjusted the IR rules to work with the operations `ModF/ModD` rather than constants `ConF/ConD`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Suggestion by Marc My review isn't enough, right? ------------- Marked as reviewed by mchevalier (Committer). PR Review: https://git.openjdk.org/jdk/pull/28762#pullrequestreview-3582003215 From qamai at openjdk.org Tue Dec 16 09:19:06 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Dec 2025 09:19:06 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v8] In-Reply-To: References: Message-ID: > Hi, > > This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Merge branch 'master' into andorxor - Merge branch 'master' into andorxor - Merge branch 'master' into andorxor - Merge branch 'master' into andorxor - Add assertion for the helper in CTPComparator Co-authored-by: Emanuel Peter - remove std::hash - remove unordered_map, add some comments for all_instances_size - Emanuel's reviews - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences ------------- Changes: https://git.openjdk.org/jdk/pull/27618/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27618&range=07 Stats: 964 lines in 9 files changed: 630 ins; 313 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/27618.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27618/head:pull/27618 PR: https://git.openjdk.org/jdk/pull/27618 From qamai at openjdk.org Tue Dec 16 09:19:08 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Dec 2025 09:19:08 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v8] In-Reply-To: References: Message-ID: <52_j1S_p1zU3IWYaoZ6w8eIZgbmarcvem1QSS5In75c=.af279e73-566c-4a86-9964-8731e4446e31@github.com> On Tue, 16 Dec 2025 07:10:19 GMT, Hannes Greule wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - Merge branch 'master' into andorxor >> - Merge branch 'master' into andorxor >> - Merge branch 'master' into andorxor >> - Merge branch 'master' into andorxor >> - Add assertion for the helper in CTPComparator >> >> Co-authored-by: Emanuel Peter >> - remove std::hash >> - remove unordered_map, add some comments for all_instances_size >> - Emanuel's reviews >> - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences > > Not sure if my review counts but I went through the changes again and it looks like all existing inference logic is covered by the more concise and also more powerful new logic. @SirYwell Thanks very much for your review! @eme64 I have just merged with master, could you test the patch again, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27618#issuecomment-3659549915 From rcastanedalo at openjdk.org Tue Dec 16 09:33:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 16 Dec 2025 09:33:54 GMT Subject: RFR: 8373355: C2: CompileCommand PrintIdealPhase should also print nodes that are not "reachable from below" [v3] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 08:52:45 GMT, Emanuel Peter wrote: >> PrintIdealPhase seems to only traverse the graph "from below". So if there are any nodes that are not reachable "from below" but only reachable "from above", then they do not show up in the printing. >> That is problematic: out TestFramework relies on that output, especially if you have some IR rule with failOn or count = 0 . The node would be in the graph, but the test tells you there is none. >> >> Before the fix, we used to see this: >> The LoadI and StoreI do NOT show up during ITER_GVN1. >> This is because at that point, the infinite loop has no exit. So the loop is not connected down to Root. >> But once we do a first loop opts round, we see the LoadI and StoreI in PHASEIDEALLOOP1. >> This is because we insert a NeverBranch, which serves as an artificial exit, that connects the loop down to Root. >> >> Fix: travese both up `+` and down `-`. >> >> --------------------------------- >> >> Note: I had to update some tests from https://github.com/openjdk/jdk/pull/22786. >> This is because during IGVN the constant nodes that lose all outputs are not removed. So now they show up in the IR graph, and the IR rules failed. I adjusted the IR rules to work with the operations `ModF/ModD` rather than constants `ConF/ConD`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Suggestion by Marc Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28762#pullrequestreview-3582158347 From epeter at openjdk.org Tue Dec 16 09:37:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 09:37:13 GMT Subject: RFR: 8373355: C2: CompileCommand PrintIdealPhase should also print nodes that are not "reachable from below" [v3] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 09:31:34 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Suggestion by Marc > > Marked as reviewed by rcastanedalo (Reviewer). @robcasloz @benoitmaillard @marc-chevalier Thanks for the reviews and approvals :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28762#issuecomment-3659618800 From epeter at openjdk.org Tue Dec 16 09:37:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 09:37:15 GMT Subject: Integrated: 8373355: C2: CompileCommand PrintIdealPhase should also print nodes that are not "reachable from below" In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 08:40:42 GMT, Emanuel Peter wrote: > PrintIdealPhase seems to only traverse the graph "from below". So if there are any nodes that are not reachable "from below" but only reachable "from above", then they do not show up in the printing. > That is problematic: out TestFramework relies on that output, especially if you have some IR rule with failOn or count = 0 . The node would be in the graph, but the test tells you there is none. > > Before the fix, we used to see this: > The LoadI and StoreI do NOT show up during ITER_GVN1. > This is because at that point, the infinite loop has no exit. So the loop is not connected down to Root. > But once we do a first loop opts round, we see the LoadI and StoreI in PHASEIDEALLOOP1. > This is because we insert a NeverBranch, which serves as an artificial exit, that connects the loop down to Root. > > Fix: travese both up `+` and down `-`. > > --------------------------------- > > Note: I had to update some tests from https://github.com/openjdk/jdk/pull/22786. > This is because during IGVN the constant nodes that lose all outputs are not removed. So now they show up in the IR graph, and the IR rules failed. I adjusted the IR rules to work with the operations `ModF/ModD` rather than constants `ConF/ConD`. This pull request has now been integrated. Changeset: 84028918 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/8402891889c29894555eca6449ba63f7b7458124 Stats: 220 lines in 5 files changed: 165 ins; 10 del; 45 mod 8373355: C2: CompileCommand PrintIdealPhase should also print nodes that are not "reachable from below" Reviewed-by: rcastanedalo, mchevalier, bmaillard ------------- PR: https://git.openjdk.org/jdk/pull/28762 From epeter at openjdk.org Tue Dec 16 09:37:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 09:37:18 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v8] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 09:19:06 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Add assertion for the helper in CTPComparator > > Co-authored-by: Emanuel Peter > - remove std::hash > - remove unordered_map, add some comments for all_instances_size > - Emanuel's reviews > - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences Testing launched ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27618#issuecomment-3659620884 From roland at openjdk.org Tue Dec 16 10:10:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 16 Dec 2025 10:10:52 GMT Subject: RFR: 8373524: C2: no reachable node should have no use Message-ID: The failure occurs because `PhiNode::Ideal` uses `set_req` to update an input of a `Phi`. That causes the previous input to be disconnected but because of the use of `set_req`, the previous input that has no use is not enqueued for `igvn` to be reclaimed. The fix is to use `set_req_X` instead. I replaced uses of `set_req` with `set_req_X` in `PhiNode::Ideal` where I thought it made sense. ------------- Commit messages: - more - test - more - more - fix Changes: https://git.openjdk.org/jdk/pull/28841/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28841&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373524 Stats: 91 lines in 2 files changed: 88 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28841.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28841/head:pull/28841 PR: https://git.openjdk.org/jdk/pull/28841 From mchevalier at openjdk.org Tue Dec 16 10:37:04 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 16 Dec 2025 10:37:04 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v6] In-Reply-To: References: Message-ID: <2iMJZ4bba63AGd49ERXLb3-VWyC6G8Tu4duP7VwApB8=.0ac8c558-f804-43f8-98f3-862bc82f23f5@github.com> > This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. > > The proposed solution seems somewhat controversial. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go. > > # Analysis > ## Obervationally > ### IGVN > During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: > > in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) > in(2): null > > We compute the join (HS' meet): > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 > > t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > But the current `_type` (of the `PhiNode` as a `TypeNode`) is > > _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) > > We filter `t` by `_type` > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 > and we get > > ft=java/lang/Object * > > which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 > and > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 > > > ### Verification > On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time > > _type=java/lang/Object * > > and so after filtering `t` by (new) `_type` and we get > > ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. > > ## But why?! > ### Details on type computation > In short, we are doing > > t = typeof(in(1)) / typeof(in(2)) > ft = t /\ _type (* IGVN *) > ft' = t /\ ft (* Verification *) > > and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". > > To me, the surprising fact was that the intersection > > java/lang/Object * (... Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into JDK-8371716 - Extract stability check - review - test - Merge branch 'master' into JDK-8371716 - More test - IgnoreUnrecognizedVMOptions - Fix bug number - Filter twice ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28331/files - new: https://git.openjdk.org/jdk/pull/28331/files/1c28403a..53b1828b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28331&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28331&range=04-05 Stats: 62643 lines in 1125 files changed: 39433 ins; 17616 del; 5594 mod Patch: https://git.openjdk.org/jdk/pull/28331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28331/head:pull/28331 PR: https://git.openjdk.org/jdk/pull/28331 From roland at openjdk.org Tue Dec 16 10:37:04 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 16 Dec 2025 10:37:04 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v6] In-Reply-To: <2iMJZ4bba63AGd49ERXLb3-VWyC6G8Tu4duP7VwApB8=.0ac8c558-f804-43f8-98f3-862bc82f23f5@github.com> References: <2iMJZ4bba63AGd49ERXLb3-VWyC6G8Tu4duP7VwApB8=.0ac8c558-f804-43f8-98f3-862bc82f23f5@github.com> Message-ID: On Tue, 16 Dec 2025 10:33:39 GMT, Marc Chevalier wrote: >> This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. >> >> The proposed solution seems somewhat controversial. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go. >> >> # Analysis >> ## Obervationally >> ### IGVN >> During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: >> >> in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) >> in(2): null >> >> We compute the join (HS' meet): >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 >> >> t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> But the current `_type` (of the `PhiNode` as a `TypeNode`) is >> >> _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) >> >> We filter `t` by `_type` >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 >> and we get >> >> ft=java/lang/Object * >> >> which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 >> and >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 >> >> >> ### Verification >> On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time >> >> _type=java/lang/Object * >> >> and so after filtering `t` by (new) `_type` and we get >> >> ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. >> >> ## But why?! >> ### Details on type computation >> In short, we are doing >> >> t = typeof(in(1)) / typeof(in(2)) >> ft = t /\ _type (* IGVN *) >> ft' = t /\ ft (* Verification *) >> >> and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again... > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into JDK-8371716 > - Extract stability check > - review > - test > - Merge branch 'master' into JDK-8371716 > - More test > - IgnoreUnrecognizedVMOptions > - Fix bug number > - Filter twice Thanks for making the change. Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28331#pullrequestreview-3582404073 From thartmann at openjdk.org Tue Dec 16 10:37:34 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 16 Dec 2025 10:37:34 GMT Subject: RFR: 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 08:46:28 GMT, Emanuel Peter wrote: > Thanks for @chhagedorn and @rwestrel for triaging / doing some first investigation. > > This is a regression of [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751) / https://github.com/openjdk/jdk/pull/24278. > > This is almost the same as https://github.com/openjdk/jdk/pull/28449, so have a quick look at it. > It was also an issue with some nodes being pinned too low, and not available at the speculative check. > There, it was the `pre_init` values of the `iv`. Now it is the variables of the `VPointer`. > The fix is pretty similar as well. > > ------------------------------------------ > > **Analysis** > > The reproducer gets a `bad graph` assert because of this cycle: > image > Note: `921 CountedLoop` is the pre-loop, the main-loop is further down from it. > And `607 ParsePredicate` is the `#Auto_Vectorization_Check`, and `1403` is the aliasing check inserted for the VPointer named below. > > This is the relevant VPointer: > `VPointer[size: 4, object, base(920 CastPP) + con( 20) + iv_scale( 0) * iv + invar(0)]` > The base `920 CastPP` is the problematic variable. > > In `VPointer::init_are_non_iv_summands_pre_loop_invariant`, we check that: > `_vloop.is_pre_loop_invariant(variable)` > And that holds for `920 CastPP`. So far so good. > > This used to be enough when we only adjusted the pre-loop limit for alignment. > But now that we need the variables for the aliasing runtime check further up, this is not sufficient any more. Analogue to https://github.com/openjdk/jdk/pull/28449, we would now need: > `this->_vloop.is_available_for_speculative_check(variable)` > And that is false for `920 CastPP`, since it is pinned after the speculative check. > > **Solution** > We should not insert the aliasing runtime check, and hence we probably cannot vectorize this case. > > For now, this makes all tests pass. I think just like with https://github.com/openjdk/jdk/pull/28449 these cases are edge cases we don't have to worry too much about. But if they ever do become important, we could try to uncast the variables. But I don't know if that is without issues, we would certainly lose some info that we get from the casts. Looks good to me! test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingCheckVPointerVariablesNotAvailable.java line 31: > 29: * aliasing check, to avoid a bad (circular) graph. > 30: * @run main/othervm > 31: * -XX:+IgnoreUnrecognizedVMOptions This is not needed, right? ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28783#pullrequestreview-3582411070 PR Review Comment: https://git.openjdk.org/jdk/pull/28783#discussion_r2622717013 From thartmann at openjdk.org Tue Dec 16 10:43:12 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 16 Dec 2025 10:43:12 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations [v4] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 09:17:48 GMT, Emanuel Peter wrote: >> We should test `Float16` with Template Framework Tests. For this, I'm now implementing: >> >> - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. >> - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. >> - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > comments for Benoit Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28095#pullrequestreview-3582432645 From roland at openjdk.org Tue Dec 16 11:15:35 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 16 Dec 2025 11:15:35 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph Message-ID: A `CreateEx` gets sunk out of loop by `PhaseIdealLoop::try_sink_out_of_loop()` and, as a consequence, the following logic: return (in(0)->is_CatchProj() && in(0)->in(0)->is_Catch() && in(0)->in(0)->in(1) == in(1)) ? this : call->in(TypeFunc::Parms); in `CreateExNode::Identity()` triggers which leads to the crash because `call->in(TypeFunc::Parms)` is not even an object in this particular case. It's actually not clear to me what that logic in `CreateExNode::Identity()` is expected to do and I wonder if it's still needed. Anyway, the fix I propose is to skip `CreateEx` in `PhaseIdealLoop::try_sink_out_of_loop()`. ------------- Commit messages: - whitespaces - tests - more - fix Changes: https://git.openjdk.org/jdk/pull/28842/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28842&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373508 Stats: 160 lines in 3 files changed: 160 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28842.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28842/head:pull/28842 PR: https://git.openjdk.org/jdk/pull/28842 From jbhateja at openjdk.org Tue Dec 16 11:42:30 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 16 Dec 2025 11:42:30 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v2] In-Reply-To: References: Message-ID: > Emulate multiplier using LEA addressing scheme, where effective address = BASE + INDEX * SCALE + OFFSET > Refer to section "3.5.1.2 Using LEA" of Intel's optimization manual for details reagarding slow vs fast lea instructions. > Given that latency of IMUL with register operands is 3 cycles, a combination of two fast LEAs each with 1 cycle latency to emulate multipler is performant. > > Consider X as the multiplicand, by variying the scale of first LEA instruction we can generate 4 input i.e. > > > X + X * 1 = 2X > X + X * 2 = 3X > X + X * 4 = 5X > X + X * 8 = 9X > > > Following table list downs various multiplier combinations for output of first LEA at BASE and/or INDEX by varying the > scale of second fast LEA instruction. We will only handle the cases which cannot be handled by just shift + add. > > > BASE INDEX SCALE MULTIPLER > X X 1 2 (Terminal) > X X 2 3 (Terminal) > X X 4 5 (Terminal) > X X 8 9 (Terminal) > 3X 3X 1 6 > X 3X 2 7 > 5X 5X 1 10 > X 5X 2 11 > X 3X 4 13 > 5X 5X 2 15 > X 2X 8 17 > 9X 9X 1 18 > X 9X 2 19 > X 5X 4 21 > 5X 5X 4 25 > 9X 9X 2 27 > X 9X 4 37 > X 5X 8 41 > 9X 9X 4 45 > X 9X 8 73 > 9X 9X 8 81 > > > All the non-unity inputs tied to BASE / INDEX are derived out of terminal cases which represent first FAST LEA. Thus, all the multipliers can be computed using just two LEA instructions. > > Performance numbers for new micro benchmark included with this patch shows around **5-50% improvments** on latest x86 servers. > > > System: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz Emerald Rapids:- > Baseline:- > Benchmark Mode Cnt Score Error Units > ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 189.690 ops/min > ConstantMultiplierOptimization.testConstMultiplierL thrpt 2 196.283 ops/min > > > Withopt:- > Benchmark Mode Cnt Score Error Units > ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 283.827 ops/min > ConstantMultiplierOptimization... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Using test-framework for JTREG test generation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28759/files - new: https://git.openjdk.org/jdk/pull/28759/files/b82aa90f..7489c7fe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28759&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28759&range=00-01 Stats: 344 lines in 1 file changed: 75 ins; 250 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/28759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28759/head:pull/28759 PR: https://git.openjdk.org/jdk/pull/28759 From jbhateja at openjdk.org Tue Dec 16 11:47:33 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 16 Dec 2025 11:47:33 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v3] In-Reply-To: References: Message-ID: > Emulate multiplier using LEA addressing scheme, where effective address = BASE + INDEX * SCALE + OFFSET > Refer to section "3.5.1.2 Using LEA" of Intel's optimization manual for details reagarding slow vs fast lea instructions. > Given that latency of IMUL with register operands is 3 cycles, a combination of two fast LEAs each with 1 cycle latency to emulate multipler is performant. > > Consider X as the multiplicand, by variying the scale of first LEA instruction we can generate 4 input i.e. > > > X + X * 1 = 2X > X + X * 2 = 3X > X + X * 4 = 5X > X + X * 8 = 9X > > > Following table list downs various multiplier combinations for output of first LEA at BASE and/or INDEX by varying the > scale of second fast LEA instruction. We will only handle the cases which cannot be handled by just shift + add. > > > BASE INDEX SCALE MULTIPLER > X X 1 2 (Terminal) > X X 2 3 (Terminal) > X X 4 5 (Terminal) > X X 8 9 (Terminal) > 3X 3X 1 6 > X 3X 2 7 > 5X 5X 1 10 > X 5X 2 11 > X 3X 4 13 > 5X 5X 2 15 > X 2X 8 17 > 9X 9X 1 18 > X 9X 2 19 > X 5X 4 21 > 5X 5X 4 25 > 9X 9X 2 27 > X 9X 4 37 > X 5X 8 41 > 9X 9X 4 45 > X 9X 8 73 > 9X 9X 8 81 > > > All the non-unity inputs tied to BASE / INDEX are derived out of terminal cases which represent first FAST LEA. Thus, all the multipliers can be computed using just two LEA instructions. > > Performance numbers for new micro benchmark included with this patch shows around **5-50% improvments** on latest x86 servers. > > > System: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz Emerald Rapids:- > Baseline:- > Benchmark Mode Cnt Score Error Units > ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 189.690 ops/min > ConstantMultiplierOptimization.testConstMultiplierL thrpt 2 196.283 ops/min > > > Withopt:- > Benchmark Mode Cnt Score Error Units > ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 283.827 ops/min > ConstantMultiplierOptimization... Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Using template-framework for JTREG test generation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28759/files - new: https://git.openjdk.org/jdk/pull/28759/files/7489c7fe..d792c49b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28759&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28759&range=01-02 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28759/head:pull/28759 PR: https://git.openjdk.org/jdk/pull/28759 From jbhateja at openjdk.org Tue Dec 16 11:56:40 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 16 Dec 2025 11:56:40 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v4] In-Reply-To: References: Message-ID: > Emulate multiplier using LEA addressing scheme, where effective address = BASE + INDEX * SCALE + OFFSET > Refer to section "3.5.1.2 Using LEA" of Intel's optimization manual for details reagarding slow vs fast lea instructions. > Given that latency of IMUL with register operands is 3 cycles, a combination of two fast LEAs each with 1 cycle latency to emulate multipler is performant. > > Consider X as the multiplicand, by variying the scale of first LEA instruction we can generate 4 input i.e. > > > X + X * 1 = 2X > X + X * 2 = 3X > X + X * 4 = 5X > X + X * 8 = 9X > > > Following table list downs various multiplier combinations for output of first LEA at BASE and/or INDEX by varying the > scale of second fast LEA instruction. We will only handle the cases which cannot be handled by just shift + add. > > > BASE INDEX SCALE MULTIPLER > X X 1 2 (Terminal) > X X 2 3 (Terminal) > X X 4 5 (Terminal) > X X 8 9 (Terminal) > 3X 3X 1 6 > X 3X 2 7 > 5X 5X 1 10 > X 5X 2 11 > X 3X 4 13 > 5X 5X 2 15 > X 2X 8 17 > 9X 9X 1 18 > X 9X 2 19 > X 5X 4 21 > 5X 5X 4 25 > 9X 9X 2 27 > X 9X 4 37 > X 5X 8 41 > 9X 9X 4 45 > X 9X 8 73 > 9X 9X 8 81 > > > All the non-unity inputs tied to BASE / INDEX are derived out of terminal cases which represent first FAST LEA. Thus, all the multipliers can be computed using just two LEA instructions. > > Performance numbers for new micro benchmark included with this patch shows around **5-50% improvments** on latest x86 servers. > > > System: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz Emerald Rapids:- > Baseline:- > Benchmark Mode Cnt Score Error Units > ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 189.690 ops/min > ConstantMultiplierOptimization.testConstMultiplierL thrpt 2 196.283 ops/min > > > Withopt:- > Benchmark Mode Cnt Score Error Units > ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 283.827 ops/min > ConstantMultiplierOptimization... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Minor cleanup in Template-Framework test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28759/files - new: https://git.openjdk.org/jdk/pull/28759/files/d792c49b..66a28502 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28759&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28759&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28759/head:pull/28759 PR: https://git.openjdk.org/jdk/pull/28759 From galder at openjdk.org Tue Dec 16 11:57:57 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 16 Dec 2025 11:57:57 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations In-Reply-To: References: Message-ID: <7bPqvzbZYcwOmSBGzbzjNAX6EceUNp-sDwSEowK_HbY=.f616c377-eabb-4879-80e2-5d183468d326@github.com> On Fri, 12 Dec 2025 17:14:31 GMT, Emanuel Peter wrote: >> @TobiHartmann Thanks for the review! >> >> @eme64 Thanks also for the review. Can you please also clarify what I said about potentially changing `PhaseIterGVN::verify_Ideal_for` in the description? > >> @TobiHartmann Thanks for the review! >> >> @eme64 Thanks also for the review. Can you please also clarify what I said about potentially changing `PhaseIterGVN::verify_Ideal_for` in the description? > > I don't remember. Try enabling the verification, and see if you find any test that fails. If not: great, maybe you fixed it! If it still fails, it would be nice if you added more info, but not neccessary. > > I don't remember because there were eventually too many cases and I stopped reporting which had failed. @eme64 Roland provided the 2nd review so you can start your internal testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28770#issuecomment-3660156041 From epeter at openjdk.org Tue Dec 16 13:17:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 13:17:24 GMT Subject: RFR: 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph [v2] In-Reply-To: References: Message-ID: > Thanks for @chhagedorn and @rwestrel for triaging / doing some first investigation. > > This is a regression of [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751) / https://github.com/openjdk/jdk/pull/24278. > > This is almost the same as https://github.com/openjdk/jdk/pull/28449, so have a quick look at it. > It was also an issue with some nodes being pinned too low, and not available at the speculative check. > There, it was the `pre_init` values of the `iv`. Now it is the variables of the `VPointer`. > The fix is pretty similar as well. > > ------------------------------------------ > > **Analysis** > > The reproducer gets a `bad graph` assert because of this cycle: > image > Note: `921 CountedLoop` is the pre-loop, the main-loop is further down from it. > And `607 ParsePredicate` is the `#Auto_Vectorization_Check`, and `1403` is the aliasing check inserted for the VPointer named below. > > This is the relevant VPointer: > `VPointer[size: 4, object, base(920 CastPP) + con( 20) + iv_scale( 0) * iv + invar(0)]` > The base `920 CastPP` is the problematic variable. > > In `VPointer::init_are_non_iv_summands_pre_loop_invariant`, we check that: > `_vloop.is_pre_loop_invariant(variable)` > And that holds for `920 CastPP`. So far so good. > > This used to be enough when we only adjusted the pre-loop limit for alignment. > But now that we need the variables for the aliasing runtime check further up, this is not sufficient any more. Analogue to https://github.com/openjdk/jdk/pull/28449, we would now need: > `this->_vloop.is_available_for_speculative_check(variable)` > And that is false for `920 CastPP`, since it is pinned after the speculative check. > > **Solution** > We should not insert the aliasing runtime check, and hence we probably cannot vectorize this case. > > For now, this makes all tests pass. I think just like with https://github.com/openjdk/jdk/pull/28449 these cases are edge cases we don't have to worry too much about. But if they ever do become important, we could try to uncast the variables. But I don't know if that is without issues, we would certainly lose some info that we get from the casts. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingCheckVPointerVariablesNotAvailable.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28783/files - new: https://git.openjdk.org/jdk/pull/28783/files/bb9b9399..e6448b66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28783&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28783&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28783/head:pull/28783 PR: https://git.openjdk.org/jdk/pull/28783 From epeter at openjdk.org Tue Dec 16 13:17:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 13:17:27 GMT Subject: RFR: 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph [v2] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 10:34:42 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingCheckVPointerVariablesNotAvailable.java > > test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingCheckVPointerVariablesNotAvailable.java line 31: > >> 29: * aliasing check, to avoid a bad (circular) graph. >> 30: * @run main/othervm >> 31: * -XX:+IgnoreUnrecognizedVMOptions > > This is not needed, right? Hmm right, might not be needed. Will try to remove it. Suggestion: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28783#discussion_r2623235185 From epeter at openjdk.org Tue Dec 16 13:23:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 13:23:40 GMT Subject: RFR: 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph [v3] In-Reply-To: References: Message-ID: > Thanks for @chhagedorn and @rwestrel for triaging / doing some first investigation. > > This is a regression of [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751) / https://github.com/openjdk/jdk/pull/24278. > > This is almost the same as https://github.com/openjdk/jdk/pull/28449, so have a quick look at it. > It was also an issue with some nodes being pinned too low, and not available at the speculative check. > There, it was the `pre_init` values of the `iv`. Now it is the variables of the `VPointer`. > The fix is pretty similar as well. > > ------------------------------------------ > > **Analysis** > > The reproducer gets a `bad graph` assert because of this cycle: > image > Note: `921 CountedLoop` is the pre-loop, the main-loop is further down from it. > And `607 ParsePredicate` is the `#Auto_Vectorization_Check`, and `1403` is the aliasing check inserted for the VPointer named below. > > This is the relevant VPointer: > `VPointer[size: 4, object, base(920 CastPP) + con( 20) + iv_scale( 0) * iv + invar(0)]` > The base `920 CastPP` is the problematic variable. > > In `VPointer::init_are_non_iv_summands_pre_loop_invariant`, we check that: > `_vloop.is_pre_loop_invariant(variable)` > And that holds for `920 CastPP`. So far so good. > > This used to be enough when we only adjusted the pre-loop limit for alignment. > But now that we need the variables for the aliasing runtime check further up, this is not sufficient any more. Analogue to https://github.com/openjdk/jdk/pull/28449, we would now need: > `this->_vloop.is_available_for_speculative_check(variable)` > And that is false for `920 CastPP`, since it is pinned after the speculative check. > > **Solution** > We should not insert the aliasing runtime check, and hence we probably cannot vectorize this case. > > For now, this makes all tests pass. I think just like with https://github.com/openjdk/jdk/pull/28449 these cases are edge cases we don't have to worry too much about. But if they ever do become important, we could try to uncast the variables. But I don't know if that is without issues, we would certainly lose some info that we get from the casts. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8373502-SW-VPointer-variables-at-speculative-check - Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingCheckVPointerVariablesNotAvailable.java - fix up detail - JDK-8373502 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28783/files - new: https://git.openjdk.org/jdk/pull/28783/files/e6448b66..55634a44 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28783&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28783&range=01-02 Stats: 8465 lines in 285 files changed: 5243 ins; 1122 del; 2100 mod Patch: https://git.openjdk.org/jdk/pull/28783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28783/head:pull/28783 PR: https://git.openjdk.org/jdk/pull/28783 From epeter at openjdk.org Tue Dec 16 13:35:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 13:35:48 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations [v4] In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 18:36:03 GMT, Beno?t Maillard wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> comments for Benoit > > Looks good to me, nice work! I only have one question. @benoitmaillard @galderz @TobiHartmann Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28095#issuecomment-3660539604 From epeter at openjdk.org Tue Dec 16 13:35:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 13:35:50 GMT Subject: Integrated: 8370922: Template Framework Library: Float16 type and operations In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 22:23:28 GMT, Emanuel Peter wrote: > We should test `Float16` with Template Framework Tests. For this, I'm now implementing: > > - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. > - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. > - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. This pull request has now been integrated. Changeset: 89e77512 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/89e77512fd44b6a0299ab36db15142e7544899f3 Stats: 380 lines in 9 files changed: 352 ins; 4 del; 24 mod 8370922: Template Framework Library: Float16 type and operations Reviewed-by: galder, thartmann, bmaillard ------------- PR: https://git.openjdk.org/jdk/pull/28095 From bkilambi at openjdk.org Tue Dec 16 13:49:03 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 16 Dec 2025 13:49:03 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v7] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 06:08:58 GMT, Jatin Bhateja wrote: > I suggest creating a seperate PR for this ? You can either create a smaller standalone reproducer testcase or mention about the tests part of this PR. Hi @jbhateja, thanks for the suggestion. Based on the comments here - https://bugs.openjdk.org/browse/JDK-8373574, is it ok if my fix (along with a regression test as suggested) be part of this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3660602956 From epeter at openjdk.org Tue Dec 16 13:51:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 13:51:01 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations [v4] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 08:07:16 GMT, Galder Zamarre?o wrote: >> `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. >> >> The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. >> >> Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. >> >> **Update 15.12.25**: `PhaseIterGVN::verify_Ideal_for` exceptions for MinI/MaxI are still needed. >> >> ~If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?):~ >> >> >> // MinINode::Ideal >> // Did not investigate, but there are some patterns that might >> // need more notification. >> case Op_MinI: >> case Op_MaxI: // preemptively removed it as well. >> return false; >> >> >> I've run tier1-3 tests on linux/x64 and they passed. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Fix redundant variable Testing launched ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28770#issuecomment-3660611100 From mchevalier at openjdk.org Tue Dec 16 14:35:24 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 16 Dec 2025 14:35:24 GMT Subject: RFR: 8371716: C2: Phi node fails Value()'s verification when speculative types clash [v6] In-Reply-To: <2iMJZ4bba63AGd49ERXLb3-VWyC6G8Tu4duP7VwApB8=.0ac8c558-f804-43f8-98f3-862bc82f23f5@github.com> References: <2iMJZ4bba63AGd49ERXLb3-VWyC6G8Tu4duP7VwApB8=.0ac8c558-f804-43f8-98f3-862bc82f23f5@github.com> Message-ID: On Tue, 16 Dec 2025 10:37:04 GMT, Marc Chevalier wrote: >> This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. >> >> The proposed solution seems somewhat controversial. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go. >> >> # Analysis >> ## Obervationally >> ### IGVN >> During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: >> >> in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) >> in(2): null >> >> We compute the join (HS' meet): >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 >> >> t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> But the current `_type` (of the `PhiNode` as a `TypeNode`) is >> >> _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) >> >> We filter `t` by `_type` >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 >> and we get >> >> ft=java/lang/Object * >> >> which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 >> and >> https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 >> >> >> ### Verification >> On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time >> >> _type=java/lang/Object * >> >> and so after filtering `t` by (new) `_type` and we get >> >> ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) >> >> which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. >> >> ## But why?! >> ### Details on type computation >> In short, we are doing >> >> t = typeof(in(1)) / typeof(in(2)) >> ft = t /\ _type (* IGVN *) >> ft' = t /\ ft (* Verification *) >> >> and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again... > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into JDK-8371716 > - Extract stability check > - review > - test > - Merge branch 'master' into JDK-8371716 > - More test > - IgnoreUnrecognizedVMOptions > - Fix bug number > - Filter twice Thanks everyone for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28331#issuecomment-3660853338 From mchevalier at openjdk.org Tue Dec 16 14:35:25 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 16 Dec 2025 14:35:25 GMT Subject: Integrated: 8371716: C2: Phi node fails Value()'s verification when speculative types clash In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 19:56:14 GMT, Marc Chevalier wrote: > This bug was originally found and reported as a Valhalla problem. It quickly became apparent it has no reason to be Valhalla-specific, while I couldn't prove so. Roland managed to make a mainline reproducer. The explanation details my Valhalla investigation, but it has nothing to do with value classes anyway. > > The proposed solution seems somewhat controversial. See https://github.com/openjdk/valhalla/pull/1717 for some previous discussion. Before polishing the PR, I'd like to reach an agreement on the way to go. > > # Analysis > ## Obervationally > ### IGVN > During IGVN, in `PhiNode::Value`, a `PhiNode` has 2 inputs. Their types are: > > in(1): java/lang/Object * (speculative=TestSpeculativeTypes$C2:NotNull:exact * (inline_depth=3)) > in(2): null > > We compute the join (HS' meet): > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1299-L1306 > > t=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > But the current `_type` (of the `PhiNode` as a `TypeNode`) is > > _type=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C1:exact *) > > We filter `t` by `_type` > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/cfgnode.cpp#L1321 > and we get > > ft=java/lang/Object * > > which is what we return. After the end of `Value`, the returned becomes the new `PhiNode`'s `_type`. > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/phaseX.cpp#L2150-L2164 > and > https://github.com/openjdk/jdk/blob/09b25cd0a24a4eaddce49917d958adc667ab5465/src/hotspot/share/opto/node.cpp#L1117-L1123 > > > ### Verification > On verification, `in(1)`, `in(2)` have the same value, so does `t`. But this time > > _type=java/lang/Object * > > and so after filtering `t` by (new) `_type` and we get > > ft=java/lang/Object * (speculative=compiler/igvn/ClashingSpeculativeTypePhiNode$C2:exact *) > > which is retuned. Verification gets angry because the new `ft` is not the same as the previous one. > > ## But why?! > ### Details on type computation > In short, we are doing > > t = typeof(in(1)) / typeof(in(2)) > ft = t /\ _type (* IGVN *) > ft' = t /\ ft (* Verification *) > > and observing that `ft != ft'`. It seems our lattice doesn't ensure `(a /\ b) /\ b = a /\ b` which is problematic for this kind of verfication that will just "try again and see if something change". > > To me, the surprising fact was that the intersection > > java/lang/Object * (... This pull request has now been integrated. Changeset: 76e79dbb Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/76e79dbb3eca5589aae6852c8f55adf0759c714e Stats: 218 lines in 3 files changed: 218 ins; 0 del; 0 mod 8371716: C2: Phi node fails Value()'s verification when speculative types clash Co-authored-by: Roland Westrelin Reviewed-by: roland, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28331 From duke at openjdk.org Tue Dec 16 15:39:07 2025 From: duke at openjdk.org (duke) Date: Tue, 16 Dec 2025 15:39:07 GMT Subject: RFR: 8373630: r18_tls should not be modified on Windows AArch64 [v2] In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 05:09:33 GMT, Saint Wesonga wrote: >> On Windows, r18_tls is used to store the pointer to the current thread's TEB. Therefore, this register should never be modified (see details in [register_aarch64.hpp](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/register_aarch64.hpp#L118-L128)). One scenario that results in the modification of r18_tls involves virtual threads on Windows. Frames are frozen by [Continuation::try_preempt](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuation.cpp#L131) on one carrier thread whose registers are saved. When the frame is thawed, execution can continue on a different carrier thread. When this happens, [rthread (x28) is fixed to point to the new carrier thread](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L2670). The continuation then results in [restore_live_registers](https://github.com/o penjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp#L313) restoring all the saved registers (including the fixed rthread register). However, this also restores x18, which was the TEB pointer for the previous carrier thread, causing the new carrier thread to execute with the TLS of the previous carrier thread. This causes hangs and occasional crashes in the virtual threads jtreg tests on Windows AArch64 that are resolved by this fix. > > Saint Wesonga has updated the pull request incrementally with one additional commit since the last revision: > > Do not modify r18_tls if R18_RESERVED is defined @swesonga Your change (at version e5a9ef0ef28947361cd9d680a55eb8d4b1fec73c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28808#issuecomment-3661162010 From jbhateja at openjdk.org Tue Dec 16 15:48:06 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 16 Dec 2025 15:48:06 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v7] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 13:45:58 GMT, Bhavana Kilambi wrote: > > I suggest creating a seperate PR for this ? You can either create a smaller standalone reproducer testcase or mention about the tests part of this PR. > > Hi @jbhateja, thanks for the suggestion. Based on the comments here - https://bugs.openjdk.org/browse/JDK-8373574, is it ok if my fix (along with a regression test as suggested) be part of this PR? Hi @Bhavana-Kilambi , As @TobiHartmann suggested we can included your patch with the PR. Best Regards ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3661203155 From roland at openjdk.org Tue Dec 16 16:54:02 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 16 Dec 2025 16:54:02 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v3] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 06:23:12 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is extracted from #28570 , there are 2 issues here: >> >> - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. >> - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. >> >> Please kindly review, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Use MemBar instead of widening the intrinsic memory src/hotspot/share/opto/graphKit.cpp line 4210: > 4208: // StoreC -> MemBar -> MergeMem -> compress_string -> MergeMem -> CharMem > 4209: // --------------------------------> > 4210: Node* all_mem = reset_memory(); This code sequence is used several times. Would it make sense to factor it out in its own method? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28789#discussion_r2624030618 From qamai at openjdk.org Tue Dec 16 17:36:18 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Dec 2025 17:36:18 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v4] In-Reply-To: References: Message-ID: > Hi, > > This is extracted from #28570 , there are 2 issues here: > > - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. > - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. > > Please kindly review, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: consolidate the memory effect into a function ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28789/files - new: https://git.openjdk.org/jdk/pull/28789/files/9649a2f2..c3503ed9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28789&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28789&range=02-03 Stats: 85 lines in 3 files changed: 24 ins; 41 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/28789.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28789/head:pull/28789 PR: https://git.openjdk.org/jdk/pull/28789 From qamai at openjdk.org Tue Dec 16 17:36:21 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Dec 2025 17:36:21 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v3] In-Reply-To: References: Message-ID: <8bRTuUyerdPx5Uy7498tn3oLOoifKOAXHSciQnVheqY=.a4a35fd6-8245-4d2c-af82-68c737ce9902@github.com> On Tue, 16 Dec 2025 16:51:27 GMT, Roland Westrelin wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Use MemBar instead of widening the intrinsic memory > > src/hotspot/share/opto/graphKit.cpp line 4210: > >> 4208: // StoreC -> MemBar -> MergeMem -> compress_string -> MergeMem -> CharMem >> 4209: // --------------------------------> >> 4210: Node* all_mem = reset_memory(); > > This code sequence is used several times. Would it make sense to factor it out in its own method? Done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28789#discussion_r2624148395 From epeter at openjdk.org Tue Dec 16 18:29:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 18:29:41 GMT Subject: RFR: 8373682: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on x86_64 with AVX but without f16c Message-ID: <-xeq-bSWOMsPZUqDhZRy1QewsTrzwizlFQDcaK9zjY8=.8798035a-fdc5-4665-908c-22e1a8e552c2@github.com> The IR rules of the test failed because we expected that `VectorCastF2HF` and `VectorCastHF2F` are available on `AVX`, but actually we need `f16c`. On Sandy Bridge we have `AVX` but not `f16c`, so the IR rules fail on those machines. Solution: require `f16c` feature. ------------- Commit messages: - JDK-8373682 Changes: https://git.openjdk.org/jdk/pull/28852/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28852&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373682 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28852.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28852/head:pull/28852 PR: https://git.openjdk.org/jdk/pull/28852 From epeter at openjdk.org Tue Dec 16 18:29:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Dec 2025 18:29:42 GMT Subject: RFR: 8373682: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on x86_64 with AVX but without f16c In-Reply-To: <-xeq-bSWOMsPZUqDhZRy1QewsTrzwizlFQDcaK9zjY8=.8798035a-fdc5-4665-908c-22e1a8e552c2@github.com> References: <-xeq-bSWOMsPZUqDhZRy1QewsTrzwizlFQDcaK9zjY8=.8798035a-fdc5-4665-908c-22e1a8e552c2@github.com> Message-ID: On Tue, 16 Dec 2025 17:45:43 GMT, Emanuel Peter wrote: > The IR rules of the test failed because we expected that `VectorCastF2HF` and `VectorCastHF2F` are available on `AVX`, but actually we need `f16c`. On Sandy Bridge we have `AVX` but not `f16c`, so the IR rules fail on those machines. > > Solution: require `f16c` feature. @jsikstro You reported this issue and seem to be running on a Sandy Brindge. Can you confirm that this fixed the issue for you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28852#issuecomment-3661853987 From aph at openjdk.org Tue Dec 16 18:39:09 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 16 Dec 2025 18:39:09 GMT Subject: RFR: 8373630: r18_tls should not be modified on Windows AArch64 [v2] In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 05:09:33 GMT, Saint Wesonga wrote: >> On Windows, r18_tls is used to store the pointer to the current thread's TEB. Therefore, this register should never be modified (see details in [register_aarch64.hpp](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/register_aarch64.hpp#L118-L128)). One scenario that results in the modification of r18_tls involves virtual threads on Windows. Frames are frozen by [Continuation::try_preempt](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuation.cpp#L131) on one carrier thread whose registers are saved. When the frame is thawed, execution can continue on a different carrier thread. When this happens, [rthread (x28) is fixed to point to the new carrier thread](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L2670). The continuation then results in [restore_live_registers](https://github.com/o penjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp#L313) restoring all the saved registers (including the fixed rthread register). However, this also restores x18, which was the TEB pointer for the previous carrier thread, causing the new carrier thread to execute with the TLS of the previous carrier thread. This causes hangs and occasional crashes in the virtual threads jtreg tests on Windows AArch64 that are resolved by this fix. > > Saint Wesonga has updated the pull request incrementally with one additional commit since the last revision: > > Do not modify r18_tls if R18_RESERVED is defined Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28808#pullrequestreview-3584470868 From duke at openjdk.org Tue Dec 16 18:39:11 2025 From: duke at openjdk.org (Saint Wesonga) Date: Tue, 16 Dec 2025 18:39:11 GMT Subject: Integrated: 8373630: r18_tls should not be modified on Windows AArch64 In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 22:54:45 GMT, Saint Wesonga wrote: > On Windows, r18_tls is used to store the pointer to the current thread's TEB. Therefore, this register should never be modified (see details in [register_aarch64.hpp](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/register_aarch64.hpp#L118-L128)). One scenario that results in the modification of r18_tls involves virtual threads on Windows. Frames are frozen by [Continuation::try_preempt](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuation.cpp#L131) on one carrier thread whose registers are saved. When the frame is thawed, execution can continue on a different carrier thread. When this happens, [rthread (x28) is fixed to point to the new carrier thread](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L2670). The continuation then results in [restore_live_registers](https://github.com/op enjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp#L313) restoring all the saved registers (including the fixed rthread register). However, this also restores x18, which was the TEB pointer for the previous carrier thread, causing the new carrier thread to execute with the TLS of the previous carrier thread. This causes hangs and occasional crashes in the virtual threads jtreg tests on Windows AArch64 that are resolved by this fix. This pull request has now been integrated. Changeset: a0dd66f9 Author: Saint Wesonga Committer: Andrew Haley URL: https://git.openjdk.org/jdk/commit/a0dd66f92d7f8400b9800847e36d036315628afb Stats: 23 lines in 1 file changed: 23 ins; 0 del; 0 mod 8373630: r18_tls should not be modified on Windows AArch64 Reviewed-by: pchilanomate, aph ------------- PR: https://git.openjdk.org/jdk/pull/28808 From kvn at openjdk.org Tue Dec 16 18:49:14 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Dec 2025 18:49:14 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v10] In-Reply-To: References: Message-ID: <7X6oguXGTGGVEZUevfO0N_-MeE-F3gRemOiLYFU2Zkc=.819a74f1-8129-4a9e-a86b-6d3408ea9169@github.com> On Tue, 16 Dec 2025 04:47:42 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. >> >> I do not see a noticeable difference in C2 runtime with and without this patch. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Much more comments, refactor the data into a separate class Looks reasonable at first glance but I need more time to go through. ------------- PR Review: https://git.openjdk.org/jdk/pull/28812#pullrequestreview-3584508583 From kvn at openjdk.org Tue Dec 16 18:51:16 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Dec 2025 18:51:16 GMT Subject: RFR: 8373682: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on x86_64 with AVX but without f16c In-Reply-To: <-xeq-bSWOMsPZUqDhZRy1QewsTrzwizlFQDcaK9zjY8=.8798035a-fdc5-4665-908c-22e1a8e552c2@github.com> References: <-xeq-bSWOMsPZUqDhZRy1QewsTrzwizlFQDcaK9zjY8=.8798035a-fdc5-4665-908c-22e1a8e552c2@github.com> Message-ID: On Tue, 16 Dec 2025 17:45:43 GMT, Emanuel Peter wrote: > The IR rules of the test failed because we expected that `VectorCastF2HF` and `VectorCastHF2F` are available on `AVX`, but actually we need `f16c`. On Sandy Bridge we have `AVX` but not `f16c`, so the IR rules fail on those machines. > > Solution: require `f16c` feature. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28852#pullrequestreview-3584512808 From liach at openjdk.org Tue Dec 16 18:54:34 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 16 Dec 2025 18:54:34 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v6] In-Reply-To: References: Message-ID: > Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. > > They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. > > We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. > > Paging @minborg who requested Optional folding for review. > > I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - Recommended test tweaks - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting - Jorn review - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting - bracket styles - Doc tweaks - Essay - Spurious change - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting - ... and 4 more: https://git.openjdk.org/jdk/compare/57f00286...567e8925 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28540/files - new: https://git.openjdk.org/jdk/pull/28540/files/b20b7f5b..567e8925 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=04-05 Stats: 21935 lines in 436 files changed: 14170 ins; 4911 del; 2854 mod Patch: https://git.openjdk.org/jdk/pull/28540.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28540/head:pull/28540 PR: https://git.openjdk.org/jdk/pull/28540 From jsikstro at openjdk.org Tue Dec 16 20:05:12 2025 From: jsikstro at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Tue, 16 Dec 2025 20:05:12 GMT Subject: RFR: 8373682: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on x86_64 with AVX but without f16c In-Reply-To: <-xeq-bSWOMsPZUqDhZRy1QewsTrzwizlFQDcaK9zjY8=.8798035a-fdc5-4665-908c-22e1a8e552c2@github.com> References: <-xeq-bSWOMsPZUqDhZRy1QewsTrzwizlFQDcaK9zjY8=.8798035a-fdc5-4665-908c-22e1a8e552c2@github.com> Message-ID: On Tue, 16 Dec 2025 17:45:43 GMT, Emanuel Peter wrote: > The IR rules of the test failed because we expected that `VectorCastF2HF` and `VectorCastHF2F` are available on `AVX`, but actually we need `f16c`. On Sandy Bridge we have `AVX` but not `f16c`, so the IR rules fail on those machines. > > Solution: require `f16c` feature. Although this is not my area, the change looks good and does fix the test failure on my Sandy Bridge machine. Thank you! ------------- Marked as reviewed by jsikstro (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28852#pullrequestreview-3584741801 From jrose at openjdk.org Tue Dec 16 20:35:17 2025 From: jrose at openjdk.org (John R Rose) Date: Tue, 16 Dec 2025 20:35:17 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v6] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 18:54:34 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Recommended test tweaks > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - Jorn review > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - bracket styles > - Doc tweaks > - Essay > - Spurious change > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - ... and 4 more: https://git.openjdk.org/jdk/compare/384c62c9...567e8925 Excellent. It does not pay down every bit of technical debt in this area, but it is a significant installment payment. I think `constantFold` is OK as a directory name, although maybe a bit vague (folloing Jorn?s observations). Perhaps `constantFields` or `fieldFolding`. Or (most specifically) `trustedFinalFields`, but I think that is unnecessarily narrow. I suggest `fieldFolding` or `foldFields` or the like. The trusting logic is one way to fold fields, and also the forthcoming `ACC_STRICT_INIT`. There are also proposals to infer strictness in the absence of `ACC_STRICT_INIT`. All of those might have test cases associated with the tests for this feature. We could expect several tests in here eventually, validating the folding properties of various flavors of constant fields. ------------- Marked as reviewed by jrose (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28540#pullrequestreview-3584835561 From fyang at openjdk.org Wed Dec 17 01:22:51 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 17 Dec 2025 01:22:51 GMT Subject: RFR: 8373069: RISC-V: implement GHASH intrinsic [v5] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 10:53:15 GMT, Anjian Wen wrote: >> support GHASH intrinsic for crypt GCM, which need zvkg extension. >> >> passed the tests in >> test/hotspot/jtreg/compiler/codegen/aes/ >> test/jdk/com/sun/crypto > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > delete some redundant assert and modify some format Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28548#pullrequestreview-3585593305 From duke at openjdk.org Wed Dec 17 02:40:02 2025 From: duke at openjdk.org (duke) Date: Wed, 17 Dec 2025 02:40:02 GMT Subject: RFR: 8373069: RISC-V: implement GHASH intrinsic [v5] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 10:53:15 GMT, Anjian Wen wrote: >> support GHASH intrinsic for crypt GCM, which need zvkg extension. >> >> passed the tests in >> test/hotspot/jtreg/compiler/codegen/aes/ >> test/jdk/com/sun/crypto > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > delete some redundant assert and modify some format @Anjian-Wen Your change (at version fb0f549fa52739938d5a2607fba4860049e0d74f) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28548#issuecomment-3663376150 From wenanjian at openjdk.org Wed Dec 17 02:44:05 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 17 Dec 2025 02:44:05 GMT Subject: Integrated: 8373069: RISC-V: implement GHASH intrinsic In-Reply-To: References: Message-ID: <9HJCSkd5XPFJ8-xXhcf7f8FD2py5_jJjB4RuEzgRSZk=.952128e6-3c8e-4c7d-8bc0-566cb0253ee2@github.com> On Fri, 28 Nov 2025 03:54:29 GMT, Anjian Wen wrote: > support GHASH intrinsic for crypt GCM, which need zvkg extension. > > passed the tests in > test/hotspot/jtreg/compiler/codegen/aes/ > test/jdk/com/sun/crypto This pull request has now been integrated. Changeset: e635330a Author: Anjian Wen Committer: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/e635330ae17fd2ce653ec75fd57fdd72d2512bba Stats: 94 lines in 6 files changed: 86 ins; 4 del; 4 mod 8373069: RISC-V: implement GHASH intrinsic Reviewed-by: fjiang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/28548 From dlong at openjdk.org Wed Dec 17 02:51:51 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 17 Dec 2025 02:51:51 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses In-Reply-To: References: Message-ID: <7_q6yB9Do9z_7IYbSwX1Dv_cR0lhCw8gX-L-wAx0tnc=.04318287-18fc-44f2-87eb-94cef265fa3a@github.com> On Mon, 15 Dec 2025 14:00:38 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/macro.cpp line 1211: >> >>> 1209: >>> 1210: Node* PhaseMacroExpand::make_store(Node* ctl, Node* mem, Node* base, int offset, Node* value, BasicType bt) { >>> 1211: Node* adr = basic_plus_adr(top(), base, offset); >> >> Doesn't this cause an assert if make_load or make_store is used with a heap oop? Isn't that a problem for code like PhaseMacroExpand::initialize_object() that calls make_store() with an object? > > `make_load`/`make_store` happen to be only called for non oop accesses. I could rename then to `make_raw_load`/`make_raw_store` to avoid any confusion. What do you think? Yes, I like make_raw_load/make_raw_store or even make_load_raw/make_store_raw. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2625404927 From dlong at openjdk.org Wed Dec 17 03:02:57 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 17 Dec 2025 03:02:57 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 15:14:57 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/memnode.cpp line 4126: >> >>> 4124: Node* base = dest; >>> 4125: if (phase->type(dest)->isa_oopptr() == nullptr) { >>> 4126: base = phase->C->top(); >> >> How is this possible? Aren't all arrays in the heap? > > `isa_oopptr()` is non null for all oops, array and instance. It seems like "dest" is always an oop here and we don't need to check isa_oopptr(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2625421249 From shade at openjdk.org Wed Dec 17 07:06:24 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 17 Dec 2025 07:06:24 GMT Subject: RFR: 8373820: C2: Robust Node::uncast_helper infinite loop check Message-ID: Current check in `Node::uncast_helper` checks for "infinite loop", but really checks for the depth of 1K nodes when searching through the graph: assert(depth_count++ < K, "infinite loop in Node::uncast_helper"); I suppose it is plausible to have a legit chain of 1K nodes with very deep inlining and/or optimization, which is _not_ an infinite loop. This might be the cause for some CTW failures in deeper stress modes: I have been running CTW stress tests for 12+ hours without ever hitting this check, and I usually hit it within that timeframe in current mainline. Given how we basically walk through `in(1)`, i.e. moving as if through the linked list of nodes, I think we can check against the number of nodes we have. If we walk more nodes than we have, that would mean we visited some node twice, which necessarily means there is a infinite loop in this walk. This makes the check more robust. Additional testing: - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` in different stress modes, 10x passes - [ ] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/28861/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28861&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373820 Stats: 8 lines in 1 file changed: 5 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28861.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28861/head:pull/28861 PR: https://git.openjdk.org/jdk/pull/28861 From chagedorn at openjdk.org Wed Dec 17 08:37:02 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Dec 2025 08:37:02 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 11:04:52 GMT, Roland Westrelin wrote: > A `CreateEx` gets sunk out of loop by > `PhaseIdealLoop::try_sink_out_of_loop()` and, as a consequence, the > following logic: > > > return (in(0)->is_CatchProj() && in(0)->in(0)->is_Catch() && > in(0)->in(0)->in(1) == in(1)) ? this : call->in(TypeFunc::Parms); > > > in `CreateExNode::Identity()` triggers which leads to the crash > because `call->in(TypeFunc::Parms)` is not even an object in this > particular case. > > It's actually not clear to me what that logic in > `CreateExNode::Identity()` is expected to do and I wonder if it's > still needed. > > Anyway, the fix I propose is to skip `CreateEx` in > `PhaseIdealLoop::try_sink_out_of_loop()`. That looks reasonable to me. test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java line 30: > 28: * @library /test/lib > 29: * @run main/othervm -Xbatch ${test.main.class} > 30: * @run main ${test.main.class} Since this test runs for 4s at least, I'm not sure if it's worth to have an Xbatch and non-Xbatch version. Does it trigger with both? test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java line 73: > 71: Thread.sleep(Utils.adjustTimeout(4000)); > 72: } > 73: Suggestion: ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28842#pullrequestreview-3586539850 PR Review Comment: https://git.openjdk.org/jdk/pull/28842#discussion_r2626052301 PR Review Comment: https://git.openjdk.org/jdk/pull/28842#discussion_r2626067883 From chagedorn at openjdk.org Wed Dec 17 08:40:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Dec 2025 08:40:59 GMT Subject: RFR: 8373524: C2: no reachable node should have no use In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 10:02:46 GMT, Roland Westrelin wrote: > The failure occurs because `PhiNode::Ideal` uses `set_req` to update > an input of a `Phi`. That causes the previous input to be disconnected > but because of the use of `set_req`, the previous input that has no > use is not enqueued for `igvn` to be reclaimed. The fix is to use > `set_req_X` instead. I replaced uses of `set_req` with `set_req_X` in > `PhiNode::Ideal` where I thought it made sense. Otherwise, looks good! src/hotspot/share/opto/cfgnode.cpp line 2599: > 2597: Node* phi = mms.memory(); > 2598: for (uint i = 1; i < req(); ++i) { > 2599: if (phi->in(i) == this) phi->set_req_X(i, phi, phase); While at it, we should add braces: Suggestion: if (phi->in(i) == this) { phi->set_req_X(i, phi, phase); } test/hotspot/jtreg/compiler/c2/TestNodeWithNoUseAfterPhiIdeal.java line 1: > 1: /* Since this is about IGVN, you could move the test to `compiler/igvn`. test/hotspot/jtreg/compiler/c2/TestNodeWithNoUseAfterPhiIdeal.java line 33: > 31: > 32: package compiler.c2; > 33: public class TestNodeWithNoUseAfterPhiIdeal { Suggestion: public class TestNodeWithNoUseAfterPhiIdeal { ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28841#pullrequestreview-3586583247 PR Review Comment: https://git.openjdk.org/jdk/pull/28841#discussion_r2626088702 PR Review Comment: https://git.openjdk.org/jdk/pull/28841#discussion_r2626096607 PR Review Comment: https://git.openjdk.org/jdk/pull/28841#discussion_r2626089810 From qamai at openjdk.org Wed Dec 17 08:44:07 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 17 Dec 2025 08:44:07 GMT Subject: RFR: 8373820: C2: Robust Node::uncast_helper infinite loop check In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 06:56:32 GMT, Aleksey Shipilev wrote: > Current check in `Node::uncast_helper` checks for "infinite loop", but really checks for the depth of 1K nodes when searching through the graph: > > > assert(depth_count++ < K, "infinite loop in Node::uncast_helper"); > > > I suppose it is plausible to have a legit chain of 1K nodes with very deep inlining and/or optimization, which is _not_ an infinite loop. This might be the cause for some CTW failures in deeper stress modes: I have been running CTW stress tests for 12+ hours without ever hitting this check, and I usually hit it within that timeframe in current mainline. > > Given how we basically walk through `in(1)`, i.e. moving as if through the linked list of nodes, I think we can check against the number of nodes we have. If we walk more nodes than we have, that would mean we visited some node twice, which necessarily means there is a infinite loop in this walk. This makes the check more robust. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` in different stress modes, 10x passes > - [ ] Linux x86_64 server fastdebug, `hotspot_compiler` > - [ ] Linux x86_64 server fastdebug, `tier1` Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28861#pullrequestreview-3586603093 From chagedorn at openjdk.org Wed Dec 17 08:55:51 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Dec 2025 08:55:51 GMT Subject: RFR: 8373682: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on x86_64 with AVX but without f16c In-Reply-To: <-xeq-bSWOMsPZUqDhZRy1QewsTrzwizlFQDcaK9zjY8=.8798035a-fdc5-4665-908c-22e1a8e552c2@github.com> References: <-xeq-bSWOMsPZUqDhZRy1QewsTrzwizlFQDcaK9zjY8=.8798035a-fdc5-4665-908c-22e1a8e552c2@github.com> Message-ID: On Tue, 16 Dec 2025 17:45:43 GMT, Emanuel Peter wrote: > The IR rules of the test failed because we expected that `VectorCastF2HF` and `VectorCastHF2F` are available on `AVX`, but actually we need `f16c`. On Sandy Bridge we have `AVX` but not `f16c`, so the IR rules fail on those machines. > > Solution: require `f16c` feature. Looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28852#pullrequestreview-3586651256 From chagedorn at openjdk.org Wed Dec 17 09:11:43 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Dec 2025 09:11:43 GMT Subject: RFR: 8370315: [IR-Framework] Allow scenarios to be run in parallel [v4] In-Reply-To: References: Message-ID: <442MuZmxk0dZkXkwbi04kk8GCJfgXciKob6ZDF779Dg=.45613e8f-ff91-4009-8eed-1f83a1766e7e@github.com> On Mon, 15 Dec 2025 12:52:54 GMT, Damon Fenacci wrote: >> test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java line 775: >> >>> 773: if (!output.isEmpty()) { >>> 774: System.out.println(output); >>> 775: } >> >> We probably also need to do a similar trick as for the exceptions in order to have ordered stdouts for the scenarios? > > I might have spoken too soon: JTReg seems to collect stdout and stderr and print them out at once at the end of each (JTReg) test. In this case it doesn't make much sense to print out the output of each test as soon as it finishes (it would be better to collect them and print them in order at the end). @chhagedorn, is there possibly a way to make JTReg print the output "on-the-fly" that you are aware of? Yes, I agree. Let's just collect everything, stdout and exceptions, and then print them in scenario index order at the end. That probably also simplifies the logic. > is there possibly a way to make JTReg print the output "on-the-fly" that you are aware of? I'm not aware of such an option but thought it would be useful in the past when having a long running test and I'm actually only interested in some printed messages at the very start. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28065#discussion_r2626198846 From chagedorn at openjdk.org Wed Dec 17 09:43:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Dec 2025 09:43:07 GMT Subject: RFR: 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode [v2] In-Reply-To: References: Message-ID: <4S82m73_eWkTc9D_fBtIwG-mABmXgX7Ebx9vayVzPaY=.49f6d87a-dbd3-4c68-8484-30c6ff6436f1@github.com> > This is a simple clean-up patch which moves `ProjNode::other_if_proj()` to `IfProjNode` and update its uses. It only makes sense to call `other_if_proj()` on actual `IfProjNodes`. > > It also required to update more types from `ProjNode` to `IfProjNode` which is more type-safe and preciser. While touching the methods, I've also added some `const`/`static` where appropriate. > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Fix nullptr - Merge branch 'master' into JDK-8373513 - 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28785/files - new: https://git.openjdk.org/jdk/pull/28785/files/914be8ce..16192544 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28785&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28785&range=00-01 Stats: 14019 lines in 551 files changed: 9174 ins; 1583 del; 3262 mod Patch: https://git.openjdk.org/jdk/pull/28785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28785/head:pull/28785 PR: https://git.openjdk.org/jdk/pull/28785 From epeter at openjdk.org Wed Dec 17 09:43:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Dec 2025 09:43:07 GMT Subject: RFR: 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode [v2] In-Reply-To: <4S82m73_eWkTc9D_fBtIwG-mABmXgX7Ebx9vayVzPaY=.49f6d87a-dbd3-4c68-8484-30c6ff6436f1@github.com> References: <4S82m73_eWkTc9D_fBtIwG-mABmXgX7Ebx9vayVzPaY=.49f6d87a-dbd3-4c68-8484-30c6ff6436f1@github.com> Message-ID: <8UmUVGgEVtIM_0GrL-3gvyx4qTBj7BGlCKojdm9HeI8=.019d0e66-4dc7-4277-b76f-8dba929728d4@github.com> On Wed, 17 Dec 2025 09:40:07 GMT, Christian Hagedorn wrote: >> This is a simple clean-up patch which moves `ProjNode::other_if_proj()` to `IfProjNode` and update its uses. It only makes sense to call `other_if_proj()` on actual `IfProjNodes`. >> >> It also required to update more types from `ProjNode` to `IfProjNode` which is more type-safe and preciser. While touching the methods, I've also added some `const`/`static` where appropriate. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Fix nullptr > - Merge branch 'master' into JDK-8373513 > - 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28785#pullrequestreview-3586840594 From chagedorn at openjdk.org Wed Dec 17 09:43:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Dec 2025 09:43:09 GMT Subject: RFR: 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode [v2] In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 13:55:02 GMT, Roland Westrelin wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Fix nullptr >> - Merge branch 'master' into JDK-8373513 >> - 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode > > Looks good to me. Thanks @rwestrel and @eme64 for your reviews! I pushed another update due to wrongly calling `as_IfProj()` on an actual nullptr. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28785#issuecomment-3664502187 From chagedorn at openjdk.org Wed Dec 17 09:47:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Dec 2025 09:47:52 GMT Subject: RFR: 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode [v2] In-Reply-To: <8UmUVGgEVtIM_0GrL-3gvyx4qTBj7BGlCKojdm9HeI8=.019d0e66-4dc7-4277-b76f-8dba929728d4@github.com> References: <4S82m73_eWkTc9D_fBtIwG-mABmXgX7Ebx9vayVzPaY=.49f6d87a-dbd3-4c68-8484-30c6ff6436f1@github.com> <8UmUVGgEVtIM_0GrL-3gvyx4qTBj7BGlCKojdm9HeI8=.019d0e66-4dc7-4277-b76f-8dba929728d4@github.com> Message-ID: On Wed, 17 Dec 2025 09:39:13 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Fix nullptr >> - Merge branch 'master' into JDK-8373513 >> - 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode > > Marked as reviewed by epeter (Reviewer). Thanks for the quick re-review @eme64! I'll run some sanity testing again before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28785#issuecomment-3664535981 From shade at openjdk.org Wed Dec 17 10:19:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 17 Dec 2025 10:19:16 GMT Subject: RFR: 8373820: C2: Robust Node::uncast_helper infinite loop check [v2] In-Reply-To: References: Message-ID: > Current check in `Node::uncast_helper` checks for "infinite loop", but really checks for the depth of 1K nodes when searching through the graph: > > > assert(depth_count++ < K, "infinite loop in Node::uncast_helper"); > > > I suppose it is plausible to have a legit chain of 1K nodes with very deep inlining and/or optimization, which is _not_ an infinite loop. This might be the cause for some CTW failures in deeper stress modes: I have been running CTW stress tests for 12+ hours without ever hitting the new check, and I usually hit the old one within that timeframe in current mainline. > > Given how we basically walk through `in(1)`, i.e. moving as if through the linked list of nodes, I think we can check against the number of nodes we have. If we walk more nodes than we have, that would mean we visited some node twice, which necessarily means there is a infinite loop in this walk. This makes the check more robust. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` in different stress modes, 10x passes > - [x] Linux x86_64 server fastdebug, `hotspot_compiler` > - [x] Linux x86_64 server fastdebug, `tier1` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8373820-c2-robust-uncast-helper-check - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28861/files - new: https://git.openjdk.org/jdk/pull/28861/files/6553c66c..ed2ac311 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28861&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28861&range=00-01 Stats: 5180 lines in 217 files changed: 3503 ins; 447 del; 1230 mod Patch: https://git.openjdk.org/jdk/pull/28861.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28861/head:pull/28861 PR: https://git.openjdk.org/jdk/pull/28861 From bkilambi at openjdk.org Wed Dec 17 10:45:46 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 17 Dec 2025 10:45:46 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v2] In-Reply-To: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: > This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - > > **For AddReduction :** > On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. > > **For MulReduction :** > Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. > > Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - > > Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the master branch. > > **N1 (UseSVE = 0, max vector length = 16B):** > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionAddFP16 256 thrpt 9 1.41 1.40 > ReductionAddFP16 512 thrpt 9 1.41 1.41 > ReductionAddFP16 1024 thrpt 9 1.43 1.40 > ReductionAddFP16 2048 thrpt 9 1.43 1.40 > ReductionMulFP16 256 thrpt 9 1.22 1.22 > ReductionMulFP16 512 thrpt 9 1.21 1.23 > ReductionMulFP16 1024 thrpt 9 1.21 1.22 > ReductionMulFP16 2048 thrpt 9 1.20 1.22 > > > On N1, the scalarized sequence of `fadd/fmul` are generated for both `MaxVectorSize` of 8B and 16B for add reduction ... Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Address review comments - Merge 'master' - 8366444: Add support for add/mul reduction operations for Float16 This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - For AddReduction : On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar "fadd" instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. On SVE targets (UseSVE > 0): Generates the "fadda" instruction which computes add reduction for floating point in strict order. For MulReduction : Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is generated and multiply reduction for vector lengths > 16B is not supported. Below is the performance of the two newly added microbenchmarks in Float16OperationsBenchmark.java tested on three different aarch64 machines and with varying MaxVectorSize - Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads ("ldrsh") to load the FP16 value into an FPR and a scalar "fadd/fmul" to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. Ratio > 1 indicates the performance with this patch is better than the master branch. N1 (UseSVE = 0, max vector length = 16B): Benchmark vectorDim Mode Cnt 8B 16B ReductionAddFP16 256 thrpt 9 1.41 1.40 ReductionAddFP16 512 thrpt 9 1.41 1.41 ReductionAddFP16 1024 thrpt 9 1.43 1.40 ReductionAddFP16 2048 thrpt 9 1.43 1.40 ReductionMulFP16 256 thrpt 9 1.22 1.22 ReductionMulFP16 512 thrpt 9 1.21 1.23 ReductionMulFP16 1024 thrpt 9 1.21 1.22 ReductionMulFP16 2048 thrpt 9 1.20 1.22 On N1, the scalarized sequence of fadd/fmul are generated for both MaxVectorSize of 8B and 16B for add reduction and mul reduction respectively. V1 (UseSVE = 1, max vector length = 32B): Benchmark vectorDim Mode Cnt 8B 16B 32B ReductionAddFP16 256 thrpt 9 1.11 1.75 2.02 ReductionAddFP16 512 thrpt 9 1.02 1.64 1.93 ReductionAddFP16 1024 thrpt 9 1.02 1.59 1.85 ReductionAddFP16 2048 thrpt 9 1.02 1.56 1.80 ReductionMulFP16 256 thrpt 9 1.12 0.99 1.09 ReductionMulFP16 512 thrpt 9 1.04 1.01 1.04 ReductionMulFP16 1024 thrpt 9 1.02 1.02 1.00 ReductionMulFP16 2048 thrpt 9 1.01 1.01 1.00 On V1, for MaxVectorSize = 8: scalarized fadd/fmul sequence will be generated for AddReductionVHF/MulReductionVHF as UseSVE defaults to 0 [2]. For MaxVectorSize = 16: scalarized "fmul" sequence is generated for MulReductionVHF and "fadda" is generated for AddReductionVHF which fetches signficant gains. For MaxVectorSize = 32: Autovectorization of MulReductionVHF is disabled for MaxVectorSize > 16B so the autovectorizer checks for maximal implemented size[1] which is 16B and generates scalarized "fmul" sequence for 16B in this case. For AddReductionVHF, it generates the "fadda" instruction. V2 (UseSVE = 2, max vector length = 16B) Benchmark vectorDim Mode Cnt 8B 16B ReductionAddFP16 256 thrpt 9 1.16 1.70 ReductionAddFP16 512 thrpt 9 1.02 1.61 ReductionAddFP16 1024 thrpt 9 1.01 1.53 ReductionAddFP16 2048 thrpt 9 1.00 1.49 ReductionMulFP16 256 thrpt 9 1.18 0.99 ReductionMulFP16 512 thrpt 9 1.04 1.01 ReductionMulFP16 1024 thrpt 9 1.02 1.02 ReductionMulFP16 2048 thrpt 9 1.01 1.01 On V2, for MaxVectorSize = 8: scalarized fadd/fmul sequence will be generated as UseSVE defaults to 0 [2]. For MaxVectorSize = 16: "fadda" instruction is generated for AddReductionVHF which results in significant gains in performance. For MulReductionVHF, the scalarized "fmul" sequence will be generated. Testing: hotspot_all, jdk(tiers1-3) and langtools(tier1) all pass on N1/V1/V2. [1] https://github.com/openjdk/jdk/blob/a272696813f2e5e896ac9de9985246aaeb9d476c/src/hotspot/share/opto/superword.cpp#L1677 [2] https://github.com/openjdk/jdk/blob/a272696813f2e5e896ac9de9985246aaeb9d476c/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L479 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27526/files - new: https://git.openjdk.org/jdk/pull/27526/files/b8eb35ba..e8e3989d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27526&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27526&range=00-01 Stats: 432095 lines in 4952 files changed: 278406 ins; 97133 del; 56556 mod Patch: https://git.openjdk.org/jdk/pull/27526.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27526/head:pull/27526 PR: https://git.openjdk.org/jdk/pull/27526 From bkilambi at openjdk.org Wed Dec 17 10:45:48 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 17 Dec 2025 10:45:48 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <_QEYCQm138PWv2vGjMFvEJ6kfMjGEn_vsuEZ_EPaRxQ=.b42967e5-cc22-4c98-a454-6698ce0a70cf@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <_QEYCQm138PWv2vGjMFvEJ6kfMjGEn_vsuEZ_EPaRxQ=.b42967e5-cc22-4c98-a454-6698ce0a70cf@github.com> Message-ID: On Thu, 11 Dec 2025 12:06:49 GMT, Marc Chevalier wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > As for the IR verification failure, I've looked a bit and couldn't find such an issue already. Since it reproduces on master, I suggest you file a ticket, indeed. Thanks! Hi @marc-chevalier @XiaohongGong I have addressed your review comments. Apologies for the delay in responding. Please take a look at the new patchset. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3664751991 From dlong at openjdk.org Wed Dec 17 10:59:09 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 17 Dec 2025 10:59:09 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 11:04:52 GMT, Roland Westrelin wrote: > It's actually not clear to me what that logic in CreateExNode::Identity() is expected to do and I wonder if it's still needed. I looked in the old SCCS history, and this code used to check if the call was to OptoRuntime::rethrow_stub(), so it appears to be trying to get the oop from arg0 to the rethrow runtime call. This code looks obsolete and broken to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28842#issuecomment-3664809085 From chagedorn at openjdk.org Wed Dec 17 11:17:32 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Dec 2025 11:17:32 GMT Subject: RFR: 8373420: C2: Add true/false_proj*() methods for IfNode as a replacement for proj_out*(true/false) In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 16:07:49 GMT, Emanuel Peter wrote: >> There are a lot of places in the code where we call `proj_out*(true/false)` on an `IfNode`. In some cases, we then cast the returned `ProjNode` back to `IfProjNode` or `IfTrueNode/IfFalseNode`. I often visit such code and now decided to clean this up. >> >> The patch proposes new `IfNode::true/false_proj*()` methods that return `IfTrueNode/IfFalseNode` directly. I walked through all `proj_out*()` calls and replaced those that used a direct `true/false` or `1/0` as argument. >> >> There are still more things to clean up in this area, for example, when we return `ProjNode` even though it should be an `IfProjNode` which requires more casting. But let's do that step by step in follow-up clean ups. >> >> Thanks, >> Christian > > Marked as reviewed by epeter (Reviewer). Thanks for your reviews @eme64, @rwestrel and @dafedafe! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28745#issuecomment-3664876821 From chagedorn at openjdk.org Wed Dec 17 11:20:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Dec 2025 11:20:53 GMT Subject: RFR: 8373820: C2: Robust Node::uncast_helper infinite loop check [v2] In-Reply-To: References: Message-ID: <2HQN3VTi9GtFxK2ylsrQhUAPpoKNy82geHPywrifK9U=.e5e6bf97-7dab-4c92-a171-cb32c6eeebda@github.com> On Wed, 17 Dec 2025 10:19:16 GMT, Aleksey Shipilev wrote: >> Current check in `Node::uncast_helper` checks for "infinite loop", but really checks for the depth of 1K nodes when searching through the graph: >> >> >> assert(depth_count++ < K, "infinite loop in Node::uncast_helper"); >> >> >> I suppose it is plausible to have a legit chain of 1K nodes with very deep inlining and/or optimization, which is _not_ an infinite loop. This might be the cause for some CTW failures in deeper stress modes: I have been running CTW stress tests for 12+ hours without ever hitting the new check, and I usually hit the old one within that timeframe in current mainline. >> >> Given how we basically walk through `in(1)`, i.e. moving as if through the linked list of nodes, I think we can check against the number of nodes we have. If we walk more nodes than we have, that would mean we visited some node twice, which necessarily means there is a infinite loop in this walk. This makes the check more robust. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` in different stress modes, 10x passes >> - [x] Linux x86_64 server fastdebug, `hotspot_compiler` >> - [x] Linux x86_64 server fastdebug, `tier1` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8373820-c2-robust-uncast-helper-check > - Fix Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28861#pullrequestreview-3587252749 From chagedorn at openjdk.org Wed Dec 17 11:21:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Dec 2025 11:21:09 GMT Subject: Integrated: 8373420: C2: Add true/false_proj*() methods for IfNode as a replacement for proj_out*(true/false) In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 13:13:44 GMT, Christian Hagedorn wrote: > There are a lot of places in the code where we call `proj_out*(true/false)` on an `IfNode`. In some cases, we then cast the returned `ProjNode` back to `IfProjNode` or `IfTrueNode/IfFalseNode`. I often visit such code and now decided to clean this up. > > The patch proposes new `IfNode::true/false_proj*()` methods that return `IfTrueNode/IfFalseNode` directly. I walked through all `proj_out*()` calls and replaced those that used a direct `true/false` or `1/0` as argument. > > There are still more things to clean up in this area, for example, when we return `ProjNode` even though it should be an `IfProjNode` which requires more casting. But let's do that step by step in follow-up clean ups. > > Thanks, > Christian This pull request has now been integrated. Changeset: e4636d69 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/e4636d69e7e41477619a163e97fd3af2e5942dde Stats: 66 lines in 11 files changed: 20 ins; 4 del; 42 mod 8373420: C2: Add true/false_proj*() methods for IfNode as a replacement for proj_out*(true/false) Reviewed-by: dfenacci, roland, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28745 From epeter at openjdk.org Wed Dec 17 11:34:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Dec 2025 11:34:19 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark Message-ID: This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) **Discussion** Observations: - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow - `linux_x64_oci_server`: Vector API leads to really nice speedups - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. - Compact Object Headers has some negative effect on some loop benchmarks. - `linux_aarch64_server`: `reduceAddI`, `copyI` - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? - `windows_x64_oci_server`: `reduceAddI` and some others a little bit - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` - Intrinsics can be much faster than auto vectoirzed or Vector API code. - `linux_aarch64_server`: `copyI` - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). **Benchmark Plots** Units: nanoseconds per algorithm invocation. `linux_x64_oci` algo_linux_x64_oci_server `windows_x64_oci` algo_windows_x64_oci_server `macosx_x64_sandybridge` algo_macosx_x64_sandybridge `linux_aarch64` algo_linux_aarch64_server `macosx_aarch64` algo_macosx_aarch64 ------------- Commit messages: - Merge branch 'master' into JDK-8373026-vector-algorithms - another IR rule fix - more small fixes and comments - more IR rules - wip more IR rules - improve IR rules - gather benchmark - gather test - filterI - findI benchmark - ... and 19 more: https://git.openjdk.org/jdk/compare/78c2d572...40c51e8f Changes: https://git.openjdk.org/jdk/pull/28639/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28639&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373026 Stats: 1576 lines in 4 files changed: 1576 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28639.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28639/head:pull/28639 PR: https://git.openjdk.org/jdk/pull/28639 From thartmann at openjdk.org Wed Dec 17 11:46:08 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 Dec 2025 11:46:08 GMT Subject: RFR: 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph [v3] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 13:23:40 GMT, Emanuel Peter wrote: >> Thanks for @chhagedorn and @rwestrel for triaging / doing some first investigation. >> >> This is a regression of [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751) / https://github.com/openjdk/jdk/pull/24278. >> >> This is almost the same as https://github.com/openjdk/jdk/pull/28449, so have a quick look at it. >> It was also an issue with some nodes being pinned too low, and not available at the speculative check. >> There, it was the `pre_init` values of the `iv`. Now it is the variables of the `VPointer`. >> The fix is pretty similar as well. >> >> ------------------------------------------ >> >> **Analysis** >> >> The reproducer gets a `bad graph` assert because of this cycle: >> image >> Note: `921 CountedLoop` is the pre-loop, the main-loop is further down from it. >> And `607 ParsePredicate` is the `#Auto_Vectorization_Check`, and `1403` is the aliasing check inserted for the VPointer named below. >> >> This is the relevant VPointer: >> `VPointer[size: 4, object, base(920 CastPP) + con( 20) + iv_scale( 0) * iv + invar(0)]` >> The base `920 CastPP` is the problematic variable. >> >> In `VPointer::init_are_non_iv_summands_pre_loop_invariant`, we check that: >> `_vloop.is_pre_loop_invariant(variable)` >> And that holds for `920 CastPP`. So far so good. >> >> This used to be enough when we only adjusted the pre-loop limit for alignment. >> But now that we need the variables for the aliasing runtime check further up, this is not sufficient any more. Analogue to https://github.com/openjdk/jdk/pull/28449, we would now need: >> `this->_vloop.is_available_for_speculative_check(variable)` >> And that is false for `920 CastPP`, since it is pinned after the speculative check. >> >> **Solution** >> We should not insert the aliasing runtime check, and hence we probably cannot vectorize this case. >> >> For now, this makes all tests pass. I think just like with https://github.com/openjdk/jdk/pull/28449 these cases are edge cases we don't have to worry too much about. But if they ever do become important, we could try to uncast the variables. But I don't know if that is without issues, we would certainly lose some info that we get from the casts. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8373502-SW-VPointer-variables-at-speculative-check > - Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingCheckVPointerVariablesNotAvailable.java > - fix up detail > - JDK-8373502 Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28783#pullrequestreview-3587343969 From alanb at openjdk.org Wed Dec 17 12:43:11 2025 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 17 Dec 2025 12:43:11 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v6] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 18:54:34 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Recommended test tweaks > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - Jorn review > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - bracket styles > - Doc tweaks > - Essay > - Spurious change > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - ... and 4 more: https://git.openjdk.org/jdk/compare/c6348e62...567e8925 Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28540#pullrequestreview-3587543908 From mchevalier at openjdk.org Wed Dec 17 12:43:16 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 17 Dec 2025 12:43:16 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v2] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Wed, 17 Dec 2025 10:45:46 GMT, Bhavana Kilambi wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Address review comments > - Merge 'master' > - 8366444: Add support for add/mul reduction operations for Float16 > > This patch adds mid-end support for vectorized add/mul reduction > operations for half floats. It also includes backend aarch64 support for > these operations. Only vectorization support through autovectorization > is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate > the implementation to be strictly ordered. The following is how each of > these reductions is implemented for different aarch64 targets - > > For AddReduction : > On Neon only targets (UseSVE = 0): Generates scalarized additions > using the scalar "fadd" instruction for both 8B and 16B vector lengths. > This is because Neon does not provide a direct instruction for computing > strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the "fadda" instruction which > computes add reduction for floating point in strict order. > > For MulReduction : > Both Neon and SVE do not provide a direct instruction for computing > strictly ordered floating point multiply reduction. For vector lengths > of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is > generated and multiply reduction for vector lengths > 16B is not > supported. > > Below is the performance of the two newly added microbenchmarks in > Float16OperationsBenchmark.java tested on three different aarch64 > machines and with varying MaxVectorSize - > > Note: On all machines, the score (ops/ms) is compared with the master > branch without this patch which generates a sequence of loads ("ldrsh") > to load the FP16 value into an FPR and a scalar "fadd/fmul" to > add/multiply the loaded value to the running sum/product. The ratios > given below are the ratios between the throughput with this patch and > the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the > master branch. > > N1 (UseSVE = 0, max vector length = 16B): > Benchmark vectorDim Mode Cnt 8B 16B > ReductionAddFP16 256 thrpt 9 1.41 1.40 > Redu... Changes requested by mchevalier (Committer). src/hotspot/share/opto/vectornode.hpp line 328: > 326: ReductionNode(ctrl, in1, in2), _requires_strict_order(requires_strict_order) {} > 327: > 328: virtual int Opcode() const; Build is failing on Mac because of `-Winconsistent-missing-override`: since you specified `override` on `bottom_type` and `ideal_reg`, you need to put `override` everywhere it applies. That means `Opcode` `requires_strict_order`, `hash`, `cmp` and `size_of`. And same in `MulReductionVHFNode`. ------------- PR Review: https://git.openjdk.org/jdk/pull/27526#pullrequestreview-3587543752 PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2626909207 From roland at openjdk.org Wed Dec 17 12:44:19 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Dec 2025 12:44:19 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v6] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:42:42 GMT, Roland Westrelin wrote: >> For this failure memory stats are: >> >> >> Total Usage: 1095525816 >> --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- >> Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other >> none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 >> parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 >> optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 >> connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 >> iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 >> idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 >> macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 >> postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 >> scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 >> regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 >> ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > package declaration Anyone else for a review of this change? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28581#issuecomment-3665173297 From roland at openjdk.org Wed Dec 17 12:46:23 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Dec 2025 12:46:23 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v7] In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 16:01:17 GMT, Roland Westrelin wrote: >> The crash occurs because verification code expects the inner and outer >> loop of a loop strip mining nest to have the same number of phis but, >> in this case, the inner loop has one more memory phis than the outer >> loop. >> >> 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and >> outer loops have the same number of phis, as expected. >> >> >> 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] >> 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> >> 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 >> through the outer loop phi: >> >> >> 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] >> 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > IR test case What should I do with this change? Should I go ahead and integrate? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28677#issuecomment-3665180874 From mhaessig at openjdk.org Wed Dec 17 12:49:21 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 17 Dec 2025 12:49:21 GMT Subject: RFR: 8373524: C2: no reachable node should have no use In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 10:02:46 GMT, Roland Westrelin wrote: > The failure occurs because `PhiNode::Ideal` uses `set_req` to update > an input of a `Phi`. That causes the previous input to be disconnected > but because of the use of `set_req`, the previous input that has no > use is not enqueued for `igvn` to be reclaimed. The fix is to use > `set_req_X` instead. I replaced uses of `set_req` with `set_req_X` in > `PhiNode::Ideal` where I thought it made sense. Thank you for fixing this, @rwestrel. The changes look good. I also kicked off some testing. ------------- PR Review: https://git.openjdk.org/jdk/pull/28841#pullrequestreview-3587565866 From jbhateja at openjdk.org Wed Dec 17 12:56:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 17 Dec 2025 12:56:01 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v9] In-Reply-To: References: Message-ID: > Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. > - Add necessary inline expander support. > - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. > - Use existing Float16 vector IR and backend support. > - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. > > The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). > > The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. > > image > > Initial RFP[1] was floated on the panama-dev mailing list. > > Kindly review the draft PR and share your feedback. > > Best Regards, > Jatin > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - Fix from Bhavana Kilambi for failing JTREG regressions on AARCH64 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Including test changes from Bhavana Kilambi (ARM) - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Optimizing tail handling - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Cleanups - Fix failing jtreg test in CI - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8370691 - Cleanups - ... and 13 more: https://git.openjdk.org/jdk/compare/5e7ae281...703f313d ------------- Changes: https://git.openjdk.org/jdk/pull/28002/files Webrev: Webrev is not available because diff is too large Stats: 515408 lines in 232 files changed: 284464 ins; 229217 del; 1727 mod Patch: https://git.openjdk.org/jdk/pull/28002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28002/head:pull/28002 PR: https://git.openjdk.org/jdk/pull/28002 From chagedorn at openjdk.org Wed Dec 17 13:41:25 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Dec 2025 13:41:25 GMT Subject: Integrated: 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode In-Reply-To: References: Message-ID: <1lDAQZq-JVt4DGRWv8SH-PvoXKpqMWm_k4sO24vHL7E=.33c7996e-001d-48f0-b3ee-62861747e5c5@github.com> On Fri, 12 Dec 2025 09:48:28 GMT, Christian Hagedorn wrote: > This is a simple clean-up patch which moves `ProjNode::other_if_proj()` to `IfProjNode` and update its uses. It only makes sense to call `other_if_proj()` on actual `IfProjNodes`. > > It also required to update more types from `ProjNode` to `IfProjNode` which is more type-safe and preciser. While touching the methods, I've also added some `const`/`static` where appropriate. > > Thanks, > Christian This pull request has now been integrated. Changeset: 9862f8f0 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/9862f8f0d351448803f8930333d5a7286e6c3565 Stats: 57 lines in 8 files changed: 6 ins; 8 del; 43 mod 8373513: C2: Move ProjNode::other_if_proj() to IfProjNode Reviewed-by: epeter, roland ------------- PR: https://git.openjdk.org/jdk/pull/28785 From rcastanedalo at openjdk.org Wed Dec 17 13:46:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 17 Dec 2025 13:46:59 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v7] In-Reply-To: References: Message-ID: <3xMKIq0p28NOY6wDBgaEwOosEiXpe1kaSmCTvL2Q0OI=.46ff7dc2-2705-4b57-a118-367a63649932@github.com> On Wed, 17 Dec 2025 12:43:11 GMT, Roland Westrelin wrote: > What should I do with this change? Should I go ahead and integrate? I'm happy with the current version of this changeset, please give me one or two days to re-run it through or CI test system. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28677#issuecomment-3665420332 From epeter at openjdk.org Wed Dec 17 13:48:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Dec 2025 13:48:17 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v6] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:42:42 GMT, Roland Westrelin wrote: >> For this failure memory stats are: >> >> >> Total Usage: 1095525816 >> --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- >> Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other >> none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 >> parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 >> optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 >> connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 >> iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 >> idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 >> macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 >> postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 >> scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 >> regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 >> ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > package declaration Looks good to me. We should have done this a while ago anyway. The only question I have is about reducing the test. @benoitmaillard did you make any progress with that? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28581#pullrequestreview-3587834222 From roland at openjdk.org Wed Dec 17 14:04:04 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Dec 2025 14:04:04 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v2] In-Reply-To: References: Message-ID: > The base input of `AddP` is expected to only be set for heap accesses > but I noticed some inconsistencies so I added an assert in the `AddP` > constructor and fixed issues that it caught. AFAFICT, the > inconsistencies shouldn't create issues. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - review - review - merge - more - more - more - undo - exps ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28769/files - new: https://git.openjdk.org/jdk/pull/28769/files/541a5d2b..38eb3b3f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28769&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28769&range=00-01 Stats: 8147 lines in 310 files changed: 5175 ins; 1079 del; 1893 mod Patch: https://git.openjdk.org/jdk/pull/28769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28769/head:pull/28769 PR: https://git.openjdk.org/jdk/pull/28769 From dlunden at openjdk.org Wed Dec 17 14:05:41 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 17 Dec 2025 14:05:41 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v6] In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 15:10:37 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> be even more rigorous > > I have made further changes that I believe have made the change pretty rigorous, I don't think I can see any flaw in the reasoning that allows mis-analysis now. Looks interesting @merykitty! I will also review this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3665506841 From roland at openjdk.org Wed Dec 17 14:06:48 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Dec 2025 14:06:48 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v2] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 03:00:34 GMT, Dean Long wrote: >> `isa_oopptr()` is non null for all oops, array and instance. > > It seems like "dest" is always an oop here and we don't need to check isa_oopptr(). It is always an heap address. But when used at object creation time, it's not yet an oop. It only becomes one once it is initialized. For those `Store`s to the not yet initialized object, I don't think the base edge is needed (there can't be any safepoint until the object becomes an actual oop) and code elsewhere (`InitializeNode::capture_store()`) doesn't set the base input either. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2627200158 From roland at openjdk.org Wed Dec 17 14:11:00 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Dec 2025 14:11:00 GMT Subject: RFR: 8373524: C2: no reachable node should have no use [v2] In-Reply-To: References: Message-ID: > The failure occurs because `PhiNode::Ideal` uses `set_req` to update > an input of a `Phi`. That causes the previous input to be disconnected > but because of the use of `set_req`, the previous input that has no > use is not enqueued for `igvn` to be reclaimed. The fix is to use > `set_req_X` instead. I replaced uses of `set_req` with `set_req_X` in > `PhiNode::Ideal` where I thought it made sense. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/c2/TestNodeWithNoUseAfterPhiIdeal.java Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/cfgnode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28841/files - new: https://git.openjdk.org/jdk/pull/28841/files/ab262397..9f51aa5b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28841&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28841&range=00-01 Stats: 4 lines in 2 files changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28841.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28841/head:pull/28841 PR: https://git.openjdk.org/jdk/pull/28841 From roland at openjdk.org Wed Dec 17 14:16:29 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Dec 2025 14:16:29 GMT Subject: RFR: 8373524: C2: no reachable node should have no use [v3] In-Reply-To: References: Message-ID: <8xp47-406XvQpBbyImEKvgKJGoAR_GLX9OorBcSEXJU=.95d37fce-8278-4408-90fc-0cc3997068a3@github.com> > The failure occurs because `PhiNode::Ideal` uses `set_req` to update > an input of a `Phi`. That causes the previous input to be disconnected > but because of the use of `set_req`, the previous input that has no > use is not enqueued for `igvn` to be reclaimed. The fix is to use > `set_req_X` instead. I replaced uses of `set_req` with `set_req_X` in > `PhiNode::Ideal` where I thought it made sense. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28841/files - new: https://git.openjdk.org/jdk/pull/28841/files/9f51aa5b..ac789382 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28841&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28841&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28841.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28841/head:pull/28841 PR: https://git.openjdk.org/jdk/pull/28841 From roland at openjdk.org Wed Dec 17 14:16:31 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Dec 2025 14:16:31 GMT Subject: RFR: 8373524: C2: no reachable node should have no use [v3] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 08:38:03 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > test/hotspot/jtreg/compiler/igvn/TestNodeWithNoUseAfterPhiIdeal.java line 1: > >> (failed to retrieve contents of file, check the PR for context) > Since this is about IGVN, you could move the test to `compiler/igvn`. Right. That makes sense. Done in the new commits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28841#discussion_r2627227969 From roland at openjdk.org Wed Dec 17 14:22:57 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Dec 2025 14:22:57 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph [v2] In-Reply-To: References: Message-ID: > A `CreateEx` gets sunk out of loop by > `PhaseIdealLoop::try_sink_out_of_loop()` and, as a consequence, the > following logic: > > > return (in(0)->is_CatchProj() && in(0)->in(0)->is_Catch() && > in(0)->in(0)->in(1) == in(1)) ? this : call->in(TypeFunc::Parms); > > > in `CreateExNode::Identity()` triggers which leads to the crash > because `call->in(TypeFunc::Parms)` is not even an object in this > particular case. > > It's actually not clear to me what that logic in > `CreateExNode::Identity()` is expected to do and I wonder if it's > still needed. > > Anyway, the fix I propose is to skip `CreateEx` in > `PhaseIdealLoop::try_sink_out_of_loop()`. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28842/files - new: https://git.openjdk.org/jdk/pull/28842/files/45097770..e4bdff59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28842&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28842&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28842.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28842/head:pull/28842 PR: https://git.openjdk.org/jdk/pull/28842 From roland at openjdk.org Wed Dec 17 14:22:58 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Dec 2025 14:22:58 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 10:56:07 GMT, Dean Long wrote: > > It's actually not clear to me what that logic in CreateExNode::Identity() is expected to do and I wonder if it's still needed. > > I looked in the old SCCS history, and this code used to check if the call was to OptoRuntime::rethrow_stub(), so it appears to be trying to get the oop from arg0 to the rethrow runtime call. This code looks obsolete and broken to me. Does it sound ok to you if I file a follow up bugs to look into that? Whether that code is removed or not, it makes little sense to sink the `CreateEx` anyway. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28842#issuecomment-3665574161 From roland at openjdk.org Wed Dec 17 14:29:08 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Dec 2025 14:29:08 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph [v2] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 08:23:21 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java >> >> Co-authored-by: Christian Hagedorn > > test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java line 30: > >> 28: * @library /test/lib >> 29: * @run main/othervm -Xbatch ${test.main.class} >> 30: * @run main ${test.main.class} > > Since this test runs for 4s at least, I'm not sure if it's worth to have an Xbatch and non-Xbatch version. Does it trigger with both? It does. Which would you keep then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28842#discussion_r2627280788 From bmaillard at openjdk.org Wed Dec 17 14:41:15 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Wed, 17 Dec 2025 14:41:15 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v6] In-Reply-To: References: Message-ID: <7766DTMPPwqyQJVBI6QwIM0G1Y1_4iUrTjzdsddRpb4=.b1b3a551-7546-4e26-b2bf-02203611b28a@github.com> On Mon, 1 Dec 2025 16:52:39 GMT, Beno?t Maillard wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> package declaration > > Thanks for fixing this @rwestrel, I agree with the solution. I noticed that this could be a problem while working on [JDK-8366990](https://bugs.openjdk.org/browse/JDK-8366990), but there was no reproducer at the time. > The only question I have is about reducing the test. @benoitmaillard did you make any progress with that? Still a WIP, I had to restart the script a few times, it's a bit of trial and error in this case. I was initially able to reduce the code by a large margin, but it still took a long time to run. Now I am trying to reduce the memory available with `-XX:CompileCommand=memlimit` to see if we can make it fail faster @eme64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28581#issuecomment-3665654318 From roland at openjdk.org Wed Dec 17 15:42:37 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Dec 2025 15:42:37 GMT Subject: RFR: 8373591: C2: Fix the memory around some intrinsics nodes [v4] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 17:36:18 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is extracted from #28570 , there are 2 issues here: >> >> - Some intrinsics nodes advertise incorrect `adr_type`. For example, `AryEqNode` reports `adr_type` being `TypeAryPtr::BYTES` (it inherits this from `StrIntrinsicNode`). This is incorrect, however, as it can accept `char[]` inputs, too. Another case is `VectorizedHashCodeNode`, which reports its `adr_type` being `TypePtr::BOTTOM`, but it actually extracts a memory slice and does not consume the whole memory. >> - For nodes such as `StrInflatedCopyNode`, as they consume more than they produce, during scheduling, we need to compute anti-dependencies. This is not the case, so we should fix it by making the nodes kill all the memory they consume. This issue is often not present because these intrinsics are not exposed bare to general usage. >> >> Please kindly review, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > consolidate the memory effect into a function Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28789#pullrequestreview-3588351130 From psandoz at openjdk.org Wed Dec 17 17:45:07 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 17 Dec 2025 17:45:07 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark In-Reply-To: References: Message-ID: <2tzkGpRPhoF3jcWsEjhCIqQRcgghd7oOUdXdmvV0HA4=.d92a76ec-ad90-4547-8ddf-bd26b0f6bec3@github.com> On Wed, 3 Dec 2025 15:11:33 GMT, Emanuel Peter wrote: > This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. > > Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) > > **Discussion** > > Observations: > - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. > - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. > - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow > - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow > - `linux_x64_oci_server`: Vector API leads to really nice speedups > - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks > - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. > - Compact Object Headers has some negative effect on some loop benchmarks. > - `linux_aarch64_server`: `reduceAddI`, `copyI` > - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` > - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? > - `windows_x64_oci_server`: `reduceAddI` and some others a little bit > - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` > - Intrinsics can be much faster than auto vectoirzed or Vector API code. > - `linux_aarch64_server`: `copyI` > - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. > - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). > > **Benchmark Plots** > > Units: nanoseconds per algorithm invocation. > > `linux_x64_oci` > algo_linux_x64_oci_server > > `windows_x64_oci` > algo_windows_x64_oci_server > > `macosx_x64_sandybridge` > algo_macosx_x64_sandybridge> >> >> assert(depth_count++ < K, "infinite loop in Node::uncast_helper"); >> >> >> I suppose it is plausible to have a legit chain of 1K nodes with very deep inlining and/or optimization, which is _not_ an infinite loop. This might be the cause for some CTW failures in deeper stress modes: I have been running CTW stress tests for 12+ hours without ever hitting the new check, and I usually hit the old one within that timeframe in current mainline. >> >> Given how we basically walk through `in(1)`, i.e. moving as if through the linked list of nodes, I think we can check against the number of nodes we have. If we walk more nodes than we have, that would mean we visited some node twice, which necessarily means there is a infinite loop in this walk. This makes the check more robust. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` in different stress modes, 10x passes >> - [x] Linux x86_64 server fastdebug, `hotspot_compiler` >> - [x] Linux x86_64 server fastdebug, `tier1` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8373820-c2-robust-uncast-helper-check > - Fix Thanks folks! The CTW testing loop is running and running without new failures, which is very nice. I will integrate this PR first thing tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28861#issuecomment-3666787814 From shade at openjdk.org Wed Dec 17 19:17:21 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 17 Dec 2025 19:17:21 GMT Subject: RFR: 8373630: r18_tls should not be modified on Windows AArch64 In-Reply-To: References: Message-ID: <2BAYipRjblYAHSQx5sIJJgTGadIShd1jE5f5LD222i4=.d084cf3a-0293-4c31-b652-84324fe26b1e@github.com> On Sat, 13 Dec 2025 05:15:17 GMT, Saint Wesonga wrote: >> Nice find. It would be really useful to have a test case that reproduces the problem, and also some idea of how likely it is. I bumped the bug to P1 for now. > >> Nice find. It would be really useful to have a test case that reproduces the problem, and also some idea of how likely it is. I bumped the bug to P1 for now. > > The virtual threads MonitorEnterExit test has a 100% failure repro rate on Windows AArch64 without this change (but it does not fail on macosx-aarch64 without this change, even though x18 is also reserved on macosx-aarch64). I was specifically running the [testMutualExclusion](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/test/jdk/java/lang/Thread/virtual/MonitorEnterExit.java#L392-L425) parametized test with 0 platform threads and at least 2 virtual threads when investigating this behavior. What a bug! @swesonga, are you handling backports? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28808#issuecomment-3666790978 From duke at openjdk.org Wed Dec 17 20:47:48 2025 From: duke at openjdk.org (Saint Wesonga) Date: Wed, 17 Dec 2025 20:47:48 GMT Subject: RFR: 8373630: r18_tls should not be modified on Windows AArch64 In-Reply-To: References: Message-ID: On Sat, 13 Dec 2025 05:15:17 GMT, Saint Wesonga wrote: >> Nice find. It would be really useful to have a test case that reproduces the problem, and also some idea of how likely it is. I bumped the bug to P1 for now. > >> Nice find. It would be really useful to have a test case that reproduces the problem, and also some idea of how likely it is. I bumped the bug to P1 for now. > > The virtual threads MonitorEnterExit test has a 100% failure repro rate on Windows AArch64 without this change (but it does not fail on macosx-aarch64 without this change, even though x18 is also reserved on macosx-aarch64). I was specifically running the [testMutualExclusion](https://github.com/openjdk/jdk/blob/23c39757ecdc834c631f98f4487cfea21c9b948b/test/jdk/java/lang/Thread/virtual/MonitorEnterExit.java#L392-L425) parametized test with 0 platform threads and at least 2 virtual threads when investigating this behavior. > What a bug! @swesonga, are you handling backports? Yes, I'm preparing the jdk26u backport this afternoon ------------- PR Comment: https://git.openjdk.org/jdk/pull/28808#issuecomment-3667085205 From dlong at openjdk.org Wed Dec 17 21:21:28 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 17 Dec 2025 21:21:28 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 14:19:21 GMT, Roland Westrelin wrote: > Whether that code is removed or not, it makes little sense to sink the CreateEx anyway. That's the part I'm still trying to understand. If we fix CreateExNode::Identity now and allow it to move outside the loop, the crash goes away. My understanding is that the CreateEx is for the exception handler. If the exception handler had a safepoint, then moving it out of the loop seems useful. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28842#issuecomment-3667213700 From liach at openjdk.org Thu Dec 18 00:03:10 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 18 Dec 2025 00:03:10 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v7] In-Reply-To: References: Message-ID: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> > Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. > > They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. > > We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. > > Paging @minborg who requested Optional folding for review. > > I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Move the test to a core library purposed directory ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28540/files - new: https://git.openjdk.org/jdk/pull/28540/files/567e8925..16db9901 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28540.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28540/head:pull/28540 PR: https://git.openjdk.org/jdk/pull/28540 From liach at openjdk.org Thu Dec 18 00:03:13 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 18 Dec 2025 00:03:13 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v6] In-Reply-To: References: Message-ID: <7WM2llDmHIma5F6pRh7EIdeubuz7qI-uTqvSsqmfug4=.6ce794e4-c270-4d83-a481-5ac75a71afc9@github.com> On Tue, 16 Dec 2025 18:54:34 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Recommended test tweaks > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - Jorn review > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - bracket styles > - Doc tweaks > - Essay > - Spurious change > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/class-final-trusting > - ... and 4 more: https://git.openjdk.org/jdk/compare/0095e40a...567e8925 After offline discussion, we have decided to move this test to `compiler/corelibs`, which will become the future home for various compiler framework-based tests verifying the behavior of core library APIs once they are compiled. For example, this package may add more constant folding verifications (such as for Lazy Constants) or loop hoisting verification for FFM API. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3667634659 From dlong at openjdk.org Thu Dec 18 02:39:55 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 18 Dec 2025 02:39:55 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v2] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 14:03:45 GMT, Roland Westrelin wrote: >> It seems like "dest" is always an oop here and we don't need to check isa_oopptr(). > > It is always an heap address. But when used at object creation time, it's not yet an oop. It only becomes one once it is initialized. For those `Store`s to the not yet initialized object, I don't think the base edge is needed (there can't be any safepoint until the object becomes an actual oop) and code elsewhere (`InitializeNode::capture_store()`) doesn't set the base input either. That makes sense as long as we can guarantee there is no safepoint. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2629279971 From dlong at openjdk.org Thu Dec 18 03:29:53 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 18 Dec 2025 03:29:53 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v2] In-Reply-To: References: Message-ID: <_x8OV9zV3TaggMMwjNpjaeBtQRIxXbmq-VvV10SmTHg=.b7a98f3f-19c6-442b-b71b-a86a5b6f685a@github.com> On Wed, 17 Dec 2025 14:04:04 GMT, Roland Westrelin wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - review > - review > - merge > - more > - more > - more > - undo > - exps src/hotspot/share/opto/memnode.cpp line 2570: > 2568: assert(tkls2->offset() == 0, "not a load of java_mirror"); > 2569: #endif > 2570: return adr2->in(AddPNode::Address); What should the value of adr2->in(AddPNode::Offset) be at this point? 0 or java_mirror_offset()? Do we need to check it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2629405380 From qamai at openjdk.org Thu Dec 18 03:41:54 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 Dec 2025 03:41:54 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v7] In-Reply-To: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> References: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> Message-ID: On Thu, 18 Dec 2025 00:03:10 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Move the test to a core library purposed directory Is this annotation just for constant folding of final fields? That seems pretty limited. Can we have it to notify that the final fields of a class can be treated as strict fields? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3668189051 From liach at openjdk.org Thu Dec 18 03:54:53 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 18 Dec 2025 03:54:53 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v7] In-Reply-To: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> References: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> Message-ID: On Thu, 18 Dec 2025 00:03:10 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Move the test to a core library purposed directory Yes, this is the purpose of this annotation for now. Unfortunately the JDK can only use strict after it leaves preview, so I think this is necessary to bridge the gap during this period. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3668227064 From qamai at openjdk.org Thu Dec 18 04:07:54 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 Dec 2025 04:07:54 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v7] In-Reply-To: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> References: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> Message-ID: On Thu, 18 Dec 2025 00:03:10 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Move the test to a core library purposed directory If the JDK wants to use strict then it can, it is the user code which does not have access to strict fields, isn't it? What I mean is why isn't this annotation stronger so we can improve the accesses into currently-trusted fields, too (`MemorySegment` fields for example). ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3668255154 From liach at openjdk.org Thu Dec 18 04:54:05 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 18 Dec 2025 04:54:05 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v7] In-Reply-To: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> References: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> Message-ID: On Thu, 18 Dec 2025 00:03:10 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Move the test to a core library purposed directory What do you mean by this annotation being "stronger"? This annotation uses class granularity because javac generates synthetic fields like local variable captures, which cannot be annotated by field annotations. The JDK will have to resort to an annotation, because JDK class files cannot use preview features because they need to be able to run with preview features off. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3668363917 From qamai at openjdk.org Thu Dec 18 05:00:56 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 Dec 2025 05:00:56 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v7] In-Reply-To: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> References: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> Message-ID: On Thu, 18 Dec 2025 00:03:10 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Move the test to a core library purposed directory What I mean by stronger is that trusted final fields only ensure that their values are unchanged after initialization. Strict fields are unchanged unconditionally, there is only 1 observable state for a strict field of an object. As a result, in addition to constant folding, we can do load hoisting, too. So my question is why this annotation does not try to enforce a stronger invariant so that we can benefit from those invariants without having to wait for strict fields. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3668379612 From epeter at openjdk.org Thu Dec 18 07:02:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Dec 2025 07:02:59 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark In-Reply-To: <2tzkGpRPhoF3jcWsEjhCIqQRcgghd7oOUdXdmvV0HA4=.d92a76ec-ad90-4547-8ddf-bd26b0f6bec3@github.com> References: <2tzkGpRPhoF3jcWsEjhCIqQRcgghd7oOUdXdmvV0HA4=.d92a76ec-ad90-4547-8ddf-bd26b0f6bec3@github.com> Message-ID: On Wed, 17 Dec 2025 17:42:35 GMT, Paul Sandoz wrote: >> This is an exploratory work. I wanted to use auto vectorization and the Vector API to implement some SIMD algorithms. We don't have too many IR tests and benchmarks, so I'm proposing an initial set of them, to be extended in the future. >> >> Note: for now they are all `int` based. And some of them may not use the Vector API optimally, so feel free to propose ideas and integrate them in a follow-up RFE ;) >> >> **Discussion** >> >> Observations: >> - If the loop can be auto vectorized, that is the fastest. If we cannot vectorize, we at least get reasonable scalar performance. >> - If the Vector API code can be fully intrinsified, we get fast code. But somtimes, the Vector API is horribly slow, much slower than scalar loop performance. >> - `linux_aarch64_server`: `filterI`, `scanAddI`, `reduceAddIFieldsX4` are very slow >> - `macosx_aarch64`: `filterI`, `scanAddI`, `reduceAddIFieldsX4`, `findMinIndex` are very slow >> - `linux_x64_oci_server`: Vector API leads to really nice speedups >> - `windows_x64_oci_server`: the only one that gets good/better performance on all benchmarks >> - `macosx_x64_sandybridge`: `scanAddI`!, `reduceAddIFieldsX4` are very slow. Other benchmarks benefit. >> - Compact Object Headers has some negative effect on some loop benchmarks. >> - `linux_aarch64_server`: `reduceAddI`, `copyI` >> - `macosx_aarch64`: `mapI`, `reduceAddI`, `copyI` >> - `linux_x64_oci_server`: `reduceAddI`, `copyI`, `findI`? >> - `windows_x64_oci_server`: `reduceAddI` and some others a little bit >> - `macosx_x64_sandybridge`: `fillI`, `iotaI`, `mapI`, `reduceAddI`, `copyI` >> - Intrinsics can be much faster than auto vectoirzed or Vector API code. >> - `linux_aarch64_server`: `copyI` >> - `macosx_x64_sandybridge`: actually, `Arrays.fill` seems to suffer with Compact Object Headers as well. >> - `rearrange` often needs to do the `mask load` and `and` operation inside the loop. That has a slight performance impact, I filed [JDK-8373240](https://bugs.openjdk.org/browse/JDK-8373240). >> >> **Benchmark Plots** >> >> Units: nanoseconds per algorithm invocation. >> >> `linux_x64_oci` >> algo_linux_x64_oci_server >> >> `windows_x64_oci` >> algo_windows_x64_oci_server >> >> `macosx_x64_sandybridge` >> > test/micro/org/openjdk/bench/vm/compiler/VectorAlgorithmsImpl.java line 361: > >> 359: v = v.compress(mask); >> 360: int trueCount = mask.trueCount(); >> 361: var prefixMask = VectorMask.fromLong(SPECIES_I, (1L << trueCount) - 1); > > In case you did not already try, you can compress the mask to produce the mask prefix, as in `v.intoArray(r, j, mask.compress())`. Unsure if that will make a difference in the codegen though. Ah, nice idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2629825254 From epeter at openjdk.org Thu Dec 18 07:03:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Dec 2025 07:03:00 GMT Subject: RFR: 8373026: C2 SuperWord and Vector API: vector algorithms test and benchmark In-Reply-To: References: <2tzkGpRPhoF3jcWsEjhCIqQRcgghd7oOUdXdmvV0HA4=.d92a76ec-ad90-4547-8ddf-bd26b0f6bec3@github.com> Message-ID: On Thu, 18 Dec 2025 06:58:50 GMT, Emanuel Peter wrote: >> test/micro/org/openjdk/bench/vm/compiler/VectorAlgorithmsImpl.java line 361: >> >>> 359: v = v.compress(mask); >>> 360: int trueCount = mask.trueCount(); >>> 361: var prefixMask = VectorMask.fromLong(SPECIES_I, (1L << trueCount) - 1); >> >> In case you did not already try, you can compress the mask to produce the mask prefix, as in `v.intoArray(r, j, mask.compress())`. Unsure if that will make a difference in the codegen though. > > Ah, nice idea! I'll note it down and implement and benchmark it later, once I get some more comments :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28639#discussion_r2629828819 From epeter at openjdk.org Thu Dec 18 07:04:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Dec 2025 07:04:56 GMT Subject: RFR: 8373682: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on x86_64 with AVX but without f16c In-Reply-To: References: <-xeq-bSWOMsPZUqDhZRy1QewsTrzwizlFQDcaK9zjY8=.8798035a-fdc5-4665-908c-22e1a8e552c2@github.com> Message-ID: <-03ZtN_0v3Yl9IboxO-DMLcu-EFi7tt2LSyI8jj3Xh4=.a3da072e-0aa4-4627-af9d-b6fe20d8aa9d@github.com> On Wed, 17 Dec 2025 08:53:24 GMT, Christian Hagedorn wrote: >> The IR rules of the test failed because we expected that `VectorCastF2HF` and `VectorCastHF2F` are available on `AVX`, but actually we need `f16c`. On Sandy Bridge we have `AVX` but not `f16c`, so the IR rules fail on those machines. >> >> Solution: require `f16c` feature. > > Looks good, thanks! @chhagedorn @vnkozlov @jsikstro Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28852#issuecomment-3668689014 From epeter at openjdk.org Thu Dec 18 07:06:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Dec 2025 07:06:59 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations [v4] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 08:07:16 GMT, Galder Zamarre?o wrote: >> `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. >> >> The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. >> >> Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. >> >> **Update 15.12.25**: `PhaseIterGVN::verify_Ideal_for` exceptions for MinI/MaxI are still needed. >> >> ~If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?):~ >> >> >> // MinINode::Ideal >> // Did not investigate, but there are some patterns that might >> // need more notification. >> case Op_MinI: >> case Op_MaxI: // preemptively removed it as well. >> return false; >> >> >> I've run tier1-3 tests on linux/x64 and they passed. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Fix redundant variable Testing passed, ship it ? ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28770#issuecomment-3668692170 From epeter at openjdk.org Thu Dec 18 07:08:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Dec 2025 07:08:01 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v8] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 09:19:06 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Add assertion for the helper in CTPComparator > > Co-authored-by: Emanuel Peter > - remove std::hash > - remove unordered_map, add some comments for all_instances_size > - Emanuel's reviews > - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences Tests passed :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27618#issuecomment-3668698607 From epeter at openjdk.org Thu Dec 18 07:08:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Dec 2025 07:08:13 GMT Subject: Integrated: 8373682: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on x86_64 with AVX but without f16c In-Reply-To: <-xeq-bSWOMsPZUqDhZRy1QewsTrzwizlFQDcaK9zjY8=.8798035a-fdc5-4665-908c-22e1a8e552c2@github.com> References: <-xeq-bSWOMsPZUqDhZRy1QewsTrzwizlFQDcaK9zjY8=.8798035a-fdc5-4665-908c-22e1a8e552c2@github.com> Message-ID: On Tue, 16 Dec 2025 17:45:43 GMT, Emanuel Peter wrote: > The IR rules of the test failed because we expected that `VectorCastF2HF` and `VectorCastHF2F` are available on `AVX`, but actually we need `f16c`. On Sandy Bridge we have `AVX` but not `f16c`, so the IR rules fail on those machines. > > Solution: require `f16c` feature. This pull request has now been integrated. Changeset: b4462625 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/b4462625413e7c2c12778eaad1f2f21d81f59c52 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8373682: Test compiler/loopopts/superword/TestReinterpretAndCast.java fails on x86_64 with AVX but without f16c Reviewed-by: kvn, jsikstro, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28852 From epeter at openjdk.org Thu Dec 18 07:08:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Dec 2025 07:08:20 GMT Subject: Integrated: 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 08:46:28 GMT, Emanuel Peter wrote: > Thanks for @chhagedorn and @rwestrel for triaging / doing some first investigation. > > This is a regression of [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751) / https://github.com/openjdk/jdk/pull/24278. > > This is almost the same as https://github.com/openjdk/jdk/pull/28449, so have a quick look at it. > It was also an issue with some nodes being pinned too low, and not available at the speculative check. > There, it was the `pre_init` values of the `iv`. Now it is the variables of the `VPointer`. > The fix is pretty similar as well. > > ------------------------------------------ > > **Analysis** > > The reproducer gets a `bad graph` assert because of this cycle: > image > Note: `921 CountedLoop` is the pre-loop, the main-loop is further down from it. > And `607 ParsePredicate` is the `#Auto_Vectorization_Check`, and `1403` is the aliasing check inserted for the VPointer named below. > > This is the relevant VPointer: > `VPointer[size: 4, object, base(920 CastPP) + con( 20) + iv_scale( 0) * iv + invar(0)]` > The base `920 CastPP` is the problematic variable. > > In `VPointer::init_are_non_iv_summands_pre_loop_invariant`, we check that: > `_vloop.is_pre_loop_invariant(variable)` > And that holds for `920 CastPP`. So far so good. > > This used to be enough when we only adjusted the pre-loop limit for alignment. > But now that we need the variables for the aliasing runtime check further up, this is not sufficient any more. Analogue to https://github.com/openjdk/jdk/pull/28449, we would now need: > `this->_vloop.is_available_for_speculative_check(variable)` > And that is false for `920 CastPP`, since it is pinned after the speculative check. > > **Solution** > We should not insert the aliasing runtime check, and hence we probably cannot vectorize this case. > > For now, this makes all tests pass. I think just like with https://github.com/openjdk/jdk/pull/28449 these cases are edge cases we don't have to worry too much about. But if they ever do become important, we could try to uncast the variables. But I don't know if that is without issues, we would certainly lose some info that we get from the casts. This pull request has now been integrated. Changeset: 00050f84 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/00050f84d44f3ec23e9c6da52bffd68770010749 Stats: 112 lines in 3 files changed: 112 ins; 0 del; 0 mod 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph Reviewed-by: thartmann, roland ------------- PR: https://git.openjdk.org/jdk/pull/28783 From epeter at openjdk.org Thu Dec 18 07:05:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Dec 2025 07:05:05 GMT Subject: RFR: 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph [v3] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 11:43:25 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8373502-SW-VPointer-variables-at-speculative-check >> - Update test/hotspot/jtreg/compiler/loopopts/superword/TestAliasingCheckVPointerVariablesNotAvailable.java >> - fix up detail >> - JDK-8373502 > > Marked as reviewed by thartmann (Reviewer). @TobiHartmann @rwestrel Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28783#issuecomment-3668687984 From qamai at openjdk.org Thu Dec 18 07:26:58 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 Dec 2025 07:26:58 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v8] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 09:19:06 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Add assertion for the helper in CTPComparator > > Co-authored-by: Emanuel Peter > - remove std::hash > - remove unordered_map, add some comments for all_instances_size > - Emanuel's reviews > - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences Thanks a lot for your reviews and testing. I'm very glad that this PR took much less time :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/27618#issuecomment-3668762656 From epeter at openjdk.org Thu Dec 18 07:31:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Dec 2025 07:31:00 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v8] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 09:19:06 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Merge branch 'master' into andorxor > - Add assertion for the helper in CTPComparator > > Co-authored-by: Emanuel Peter > - remove std::hash > - remove unordered_map, add some comments for all_instances_size > - Emanuel's reviews > - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27618#pullrequestreview-3591187556 From qamai at openjdk.org Thu Dec 18 07:31:02 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 Dec 2025 07:31:02 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v8] In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 07:05:06 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - Merge branch 'master' into andorxor >> - Merge branch 'master' into andorxor >> - Merge branch 'master' into andorxor >> - Merge branch 'master' into andorxor >> - Add assertion for the helper in CTPComparator >> >> Co-authored-by: Emanuel Peter >> - remove std::hash >> - remove unordered_map, add some comments for all_instances_size >> - Emanuel's reviews >> - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences > > Tests passed :) @eme64 Please reapprove this PR if you think it can integrate now, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27618#issuecomment-3668774252 From qamai at openjdk.org Thu Dec 18 07:34:09 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 Dec 2025 07:34:09 GMT Subject: Integrated: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations In-Reply-To: References: Message-ID: <0OCcvJWvibSXGQ8haUHXIwD-8pc1mhRDuGZ-BCvXNqs=.5487f297-3225-4f8b-9ec1-95a998d1b236@github.com> On Fri, 3 Oct 2025 06:07:50 GMT, Quan Anh Mai wrote: > Hi, > > This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. > > Please take a look and leave your reviews, thanks a lot. This pull request has now been integrated. Changeset: e6780506 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/e67805067a8f537862200e808e20464f12d21c9c Stats: 964 lines in 9 files changed: 630 ins; 313 del; 21 mod 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations Reviewed-by: hgreule, epeter ------------- PR: https://git.openjdk.org/jdk/pull/27618 From duke at openjdk.org Thu Dec 18 07:40:08 2025 From: duke at openjdk.org (Tobias Hotz) Date: Thu, 18 Dec 2025 07:40:08 GMT Subject: Integrated: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs In-Reply-To: References: Message-ID: <-w-eb-bHQZ8GhqSuWHn4CFOHe-5RRP03OEaLGkw33Qw=.286a0e0c-0b8c-4c5e-8ef8-3bbcbd97a4a6@github.com> On Sun, 6 Jul 2025 08:08:25 GMT, Tobias Hotz wrote: > This PR improves the value of interger division nodes. > Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case > We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. > This also cleans up and unifies the code paths for DivINode and DivLNode. > I've added some tests to validate the optimization. Without the changes, some of these tests fail. This pull request has now been integrated. Changeset: 85983069 Author: Tobias Hotz Committer: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/859830694b3db0b81b422bf9b2ce9c7ab9a19a85 Stats: 717 lines in 2 files changed: 613 ins; 90 del; 14 mod 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs Reviewed-by: mhaessig, epeter, bmaillard ------------- PR: https://git.openjdk.org/jdk/pull/26143 From hgreule at openjdk.org Thu Dec 18 08:25:07 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 18 Dec 2025 08:25:07 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: <96rrRPjmvAcbsl7wlLUYck5FTD7WkQrY52PKyiwOn2s=.03996b48-829b-4048-925d-04943736fed1@github.com> References: <1XovbnGPMfTX45dlT5PFCk1Bqb3Pyc_kN8vC874lKm4=.78ec990d-1950-4fa9-8dea-065a09414a1c@github.com> <96rrRPjmvAcbsl7wlLUYck5FTD7WkQrY52PKyiwOn2s=.03996b48-829b-4048-925d-04943736fed1@github.com> Message-ID: On Mon, 15 Dec 2025 11:52:03 GMT, Emanuel Peter wrote: >> @eme64 It may be a little bit more in terms of LOC, but it is always simpler to reason about when we have things executing in a consequential manner rather than randomly delaying some of them. > > @merykitty Why not file an RFE with a reproducer for incremental inlining? > > And yes, I totally agree it would be nicer if things were organized better! @eme64 GHA passed, could you run testing? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3668990044 From epeter at openjdk.org Thu Dec 18 08:30:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Dec 2025 08:30:25 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: <1XovbnGPMfTX45dlT5PFCk1Bqb3Pyc_kN8vC874lKm4=.78ec990d-1950-4fa9-8dea-065a09414a1c@github.com> <96rrRPjmvAcbsl7wlLUYck5FTD7WkQrY52PKyiwOn2s=.03996b48-829b-4048-925d-04943736fed1@github.com> Message-ID: <0B36hEWoN3J3I92tJbzY1O-r9sU-sF5cWKOgAOyvxsc=.c584240d-b9e5-4873-9bb8-ed56e4a3cb0e@github.com> On Thu, 18 Dec 2025 08:21:51 GMT, Hannes Greule wrote: >> @merykitty Why not file an RFE with a reproducer for incremental inlining? >> >> And yes, I totally agree it would be nicer if things were organized better! > > @eme64 GHA passed, could you run testing? Thanks! @SirYwell Testing launched ? But no guarantees that I'll get back to you before the new year ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3669015859 From bmaillard at openjdk.org Thu Dec 18 09:13:12 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 18 Dec 2025 09:13:12 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v2] In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
> Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Assert directly in the verify methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28295/files - new: https://git.openjdk.org/jdk/pull/28295/files/ad20446d..621ebdaa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=00-01 Stats: 96 lines in 2 files changed: 11 ins; 7 del; 78 mod Patch: https://git.openjdk.org/jdk/pull/28295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28295/head:pull/28295 PR: https://git.openjdk.org/jdk/pull/28295 From shade at openjdk.org Thu Dec 18 09:46:50 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 18 Dec 2025 09:46:50 GMT Subject: RFR: 8373820: C2: Robust Node::uncast_helper infinite loop check [v2] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 10:19:16 GMT, Aleksey Shipilev wrote: >> Current check in `Node::uncast_helper` checks for "infinite loop", but really checks for the depth of 1K nodes when searching through the graph: >> >> >> assert(depth_count++ < K, "infinite loop in Node::uncast_helper"); >> >> >> I suppose it is plausible to have a legit chain of 1K nodes with very deep inlining and/or optimization, which is _not_ an infinite loop. This might be the cause for some CTW failures in deeper stress modes: I have been running CTW stress tests for 12+ hours without ever hitting the new check, and I usually hit the old one within that timeframe in current mainline. >> >> Given how we basically walk through `in(1)`, i.e. moving as if through the linked list of nodes, I think we can check against the number of nodes we have. If we walk more nodes than we have, that would mean we visited some node twice, which necessarily means there is a infinite loop in this walk. This makes the check more robust. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` in different stress modes, 10x passes >> - [x] Linux x86_64 server fastdebug, `hotspot_compiler` >> - [x] Linux x86_64 server fastdebug, `tier1` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8373820-c2-robust-uncast-helper-check > - Fix CTW test loop have been running for more than 36 hours by now, and I have not seen this new check failing even once. Sounds like we indeed had the flimsy check before. Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28861#issuecomment-3669375531 From shade at openjdk.org Thu Dec 18 09:46:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 18 Dec 2025 09:46:52 GMT Subject: Integrated: 8373820: C2: Robust Node::uncast_helper infinite loop check In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 06:56:32 GMT, Aleksey Shipilev wrote: > Current check in `Node::uncast_helper` checks for "infinite loop", but really checks for the depth of 1K nodes when searching through the graph: > > > assert(depth_count++ < K, "infinite loop in Node::uncast_helper"); > > > I suppose it is plausible to have a legit chain of 1K nodes with very deep inlining and/or optimization, which is _not_ an infinite loop. This might be the cause for some CTW failures in deeper stress modes: I have been running CTW stress tests for 12+ hours without ever hitting the new check, and I usually hit the old one within that timeframe in current mainline. > > Given how we basically walk through `in(1)`, i.e. moving as if through the linked list of nodes, I think we can check against the number of nodes we have. If we walk more nodes than we have, that would mean we visited some node twice, which necessarily means there is a infinite loop in this walk. This makes the check more robust. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` in different stress modes, 10x passes > - [x] Linux x86_64 server fastdebug, `hotspot_compiler` > - [x] Linux x86_64 server fastdebug, `tier1` This pull request has now been integrated. Changeset: 4f283f18 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/4f283f188c43cb25c4eafcdf22eb7f58eae286cc Stats: 8 lines in 1 file changed: 5 ins; 1 del; 2 mod 8373820: C2: Robust Node::uncast_helper infinite loop check Reviewed-by: qamai, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28861 From bkilambi at openjdk.org Thu Dec 18 10:17:47 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 18 Dec 2025 10:17:47 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v3] In-Reply-To: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: > This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - > > **For AddReduction :** > On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. > > **For MulReduction :** > Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. > > Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - > > Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the master branch. > > **N1 (UseSVE = 0, max vector length = 16B):** > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionAddFP16 256 thrpt 9 1.41 1.40 > ReductionAddFP16 512 thrpt 9 1.41 1.41 > ReductionAddFP16 1024 thrpt 9 1.43 1.40 > ReductionAddFP16 2048 thrpt 9 1.43 1.40 > ReductionMulFP16 256 thrpt 9 1.22 1.22 > ReductionMulFP16 512 thrpt 9 1.21 1.23 > ReductionMulFP16 1024 thrpt 9 1.21 1.22 > ReductionMulFP16 2048 thrpt 9 1.20 1.22 > > > On N1, the scalarized sequence of `fadd/fmul` are generated for both `MaxVectorSize` of 8B and 16B for add reduction ... Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Fix build failures on Mac ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27526/files - new: https://git.openjdk.org/jdk/pull/27526/files/e8e3989d..620b422a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27526&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27526&range=01-02 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/27526.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27526/head:pull/27526 PR: https://git.openjdk.org/jdk/pull/27526 From bkilambi at openjdk.org Thu Dec 18 10:17:52 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 18 Dec 2025 10:17:52 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v2] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Wed, 17 Dec 2025 12:40:38 GMT, Marc Chevalier wrote: >> Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Address review comments >> - Merge 'master' >> - 8366444: Add support for add/mul reduction operations for Float16 >> >> This patch adds mid-end support for vectorized add/mul reduction >> operations for half floats. It also includes backend aarch64 support for >> these operations. Only vectorization support through autovectorization >> is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate >> the implementation to be strictly ordered. The following is how each of >> these reductions is implemented for different aarch64 targets - >> >> For AddReduction : >> On Neon only targets (UseSVE = 0): Generates scalarized additions >> using the scalar "fadd" instruction for both 8B and 16B vector lengths. >> This is because Neon does not provide a direct instruction for computing >> strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the "fadda" instruction which >> computes add reduction for floating point in strict order. >> >> For MulReduction : >> Both Neon and SVE do not provide a direct instruction for computing >> strictly ordered floating point multiply reduction. For vector lengths >> of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is >> generated and multiply reduction for vector lengths > 16B is not >> supported. >> >> Below is the performance of the two newly added microbenchmarks in >> Float16OperationsBenchmark.java tested on three different aarch64 >> machines and with varying MaxVectorSize - >> >> Note: On all machines, the score (ops/ms) is compared with the master >> branch without this patch which generates a sequence of loads ("ldrsh") >> to load the FP16 value into an FPR and a scalar "fadd/fmul" to >> add/multiply the loaded value to the running sum/product. The ratios >> given below are the ratios between the throughput with this patch and >> the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the >> master branch. >> >> N1 (UseSVE = 0, max vector length = 16B): >> Benchmark vecto... > > src/hotspot/share/opto/vectornode.hpp line 328: > >> 326: ReductionNode(ctrl, in1, in2), _requires_strict_order(requires_strict_order) {} >> 327: >> 328: virtual int Opcode() const; > > Build is failing on Mac because of `-Winconsistent-missing-override`: since you specified `override` on `bottom_type` and `ideal_reg`, you need to put `override` everywhere it applies. That means `Opcode` `requires_strict_order`, `hash`, `cmp` and `size_of`. And same in `MulReductionVHFNode`. Done thanks. Could you please take another look? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2630407402 From bmaillard at openjdk.org Thu Dec 18 10:20:03 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 18 Dec 2025 10:20:03 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v3] In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
> Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... Beno?t Maillard has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into JDK-8371536 - Assert directly in the verify methods - Add comment for _table.hash_delete(n) - Change assert to print only the cause and the node name Bring back old comment Wording - Assert at first failure - Remove node from hash table before calling Ideal in verification ------------- Changes: https://git.openjdk.org/jdk/pull/28295/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=02 Stats: 122 lines in 2 files changed: 33 ins; 7 del; 82 mod Patch: https://git.openjdk.org/jdk/pull/28295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28295/head:pull/28295 PR: https://git.openjdk.org/jdk/pull/28295 From mli at openjdk.org Thu Dec 18 10:20:32 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 18 Dec 2025 10:20:32 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) [v5] In-Reply-To: <8UNRuvPlqC7XfrAGCThuwc7RGE2q5RFlRg9LavNfTrM=.538e9b99-0256-47f9-b784-5053811aa8a0@github.com> References: <8UNRuvPlqC7XfrAGCThuwc7RGE2q5RFlRg9LavNfTrM=.538e9b99-0256-47f9-b784-5053811aa8a0@github.com> Message-ID: On Wed, 10 Dec 2025 21:03:30 GMT, Hamlin Li wrote: >> Hi, >> >> Can you help to review this patch? >> >> This patch enables the vectorization of statement like `op_1 bop op_2 ? res_f_d_1 : res_f_d_2` in a loop, where op_x's size is different from res_f_d_x's. >> >> To assist with code review, this pr contains only the shared code change, is splitted from https://github.com/openjdk/jdk/pull/28230, which enable & implement the riscv part. The similar optimization could be extended to other platforms. >> >> ## Some background >> >> Previously, it's https://github.com/openjdk/jdk/pull/25336, which was blocked by unsigned comparison issue. The issue was recently resolved by https://github.com/openjdk/jdk/pull/27942, so I'm re-start working on this optimization. >> >> This pr only relaxes one of the constraints in https://github.com/openjdk/jdk/pull/25336, i.e. transform CMoveF/D to vector operations no matter what's the size of comparison's operator, but remove the optimization of transform CMoveI/L to vector operations which I think need more investigation. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Thanks > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 38 additional commits since the last revision: > > - enable riscv > - tests > - review comment > - Merge branch 'master' into vectorize-CMove-Bool > - Merge branch 'openjdk:master' into master > - Merge branch 'openjdk:master' into master > - Merge branch 'master' into vectorize-CMove-Bool > - Merge branch 'openjdk:master' into master > - Merge branch 'openjdk:master' into master > - fix typo > - ... and 28 more: https://git.openjdk.org/jdk/compare/10082235...ecc84adc This pr will cause some test failure, I'll investigate futher and update later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28231#issuecomment-3669534319 From mchevalier at openjdk.org Thu Dec 18 10:21:33 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 18 Dec 2025 10:21:33 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v2] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: <9aaKc1g9G-g7a6oZ5s-hZASP_j3_qlAPKN070WQXUew=.6307668d-1d96-467b-90c4-2b5b33041366@github.com> On Thu, 18 Dec 2025 10:12:20 GMT, Bhavana Kilambi wrote: >> src/hotspot/share/opto/vectornode.hpp line 328: >> >>> 326: ReductionNode(ctrl, in1, in2), _requires_strict_order(requires_strict_order) {} >>> 327: >>> 328: virtual int Opcode() const; >> >> Build is failing on Mac because of `-Winconsistent-missing-override`: since you specified `override` on `bottom_type` and `ideal_reg`, you need to put `override` everywhere it applies. That means `Opcode` `requires_strict_order`, `hash`, `cmp` and `size_of`. And same in `MulReductionVHFNode`. > > Done thanks. Could you please take another look? Running some tests. I'll take a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2630429246 From bmaillard at openjdk.org Thu Dec 18 10:29:18 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 18 Dec 2025 10:29:18 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v4] In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
> Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Style and comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28295/files - new: https://git.openjdk.org/jdk/pull/28295/files/dacad10b..75972163 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=02-03 Stats: 13 lines in 2 files changed: 1 ins; 8 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28295/head:pull/28295 PR: https://git.openjdk.org/jdk/pull/28295 From rcastanedalo at openjdk.org Thu Dec 18 10:30:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 18 Dec 2025 10:30:52 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v7] In-Reply-To: References: Message-ID: <_fMhjEKgqwaGjEytQCKw1-TqdbQ3TnAah3GT7NMbL5Q=.d1702527-8bd0-43b7-8a42-5584a4f8b640@github.com> On Fri, 12 Dec 2025 16:01:17 GMT, Roland Westrelin wrote: >> The crash occurs because verification code expects the inner and outer >> loop of a loop strip mining nest to have the same number of phis but, >> in this case, the inner loop has one more memory phis than the outer >> loop. >> >> 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and >> outer loops have the same number of phis, as expected. >> >> >> 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] >> 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> >> 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 >> through the outer loop phi: >> >> >> 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] >> 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) >> >> 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > IR test case Test results look good. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28677#pullrequestreview-3592017945 From roland at openjdk.org Thu Dec 18 10:39:25 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 18 Dec 2025 10:39:25 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis [v7] In-Reply-To: <3xMKIq0p28NOY6wDBgaEwOosEiXpe1kaSmCTvL2Q0OI=.46ff7dc2-2705-4b57-a118-367a63649932@github.com> References: <3xMKIq0p28NOY6wDBgaEwOosEiXpe1kaSmCTvL2Q0OI=.46ff7dc2-2705-4b57-a118-367a63649932@github.com> Message-ID: On Wed, 17 Dec 2025 13:43:37 GMT, Roberto Casta?eda Lozano wrote: >> What should I do with this change? Should I go ahead and integrate? > >> What should I do with this change? Should I go ahead and integrate? > > I'm happy with the current version of this changeset, please give me one or two days to re-run it through or CI test system. @robcasloz @dlunde @dafedafe thanks for the reviews and testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/28677#issuecomment-3669611427 From roland at openjdk.org Thu Dec 18 10:39:28 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 18 Dec 2025 10:39:28 GMT Subject: Integrated: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 11:49:29 GMT, Roland Westrelin wrote: > The crash occurs because verification code expects the inner and outer > loop of a loop strip mining nest to have the same number of phis but, > in this case, the inner loop has one more memory phis than the outer > loop. > > 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and > outer loops have the same number of phis, as expected. > > > 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] > 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > > 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 > through the outer loop phi: > > > 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] > 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) > > 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTe... This pull request has now been integrated. Changeset: 2ba423db Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/2ba423db9925355348106fc9fcf84450123d2605 Stats: 195 lines in 6 files changed: 173 ins; 16 del; 6 mod 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis Reviewed-by: rcastanedalo, dlunden, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/28677 From bmaillard at openjdk.org Thu Dec 18 10:39:37 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 18 Dec 2025 10:39:37 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v5] In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
> Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: More style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28295/files - new: https://git.openjdk.org/jdk/pull/28295/files/75972163..339f3895 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=03-04 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28295/head:pull/28295 PR: https://git.openjdk.org/jdk/pull/28295 From galder at openjdk.org Thu Dec 18 11:37:53 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 18 Dec 2025 11:37:53 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN Message-ID: Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. I've run tier1-3 tests on linux/x64 successfully. ------------- Commit messages: - Test Float16 - Only apply to uses that match original IR node - Merge branch 'master' into topic.uses-min-max - Use is_MinMax() instead of spelling out individual Min/Max opcodes - Refactor MaxNode to MinMaxNode and add is_MinMax() query - Add max(a, max(b, c)) patterns to add users of use - Add templated test - Remove exclude or Min/Max in verify identity Changes: https://git.openjdk.org/jdk/pull/28895/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373134 Stats: 268 lines in 9 files changed: 196 ins; 21 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/28895.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28895/head:pull/28895 PR: https://git.openjdk.org/jdk/pull/28895 From duke at openjdk.org Thu Dec 18 11:49:08 2025 From: duke at openjdk.org (Tobias Hotz) Date: Thu, 18 Dec 2025 11:49:08 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation In-Reply-To: References: <1XovbnGPMfTX45dlT5PFCk1Bqb3Pyc_kN8vC874lKm4=.78ec990d-1950-4fa9-8dea-065a09414a1c@github.com> <96rrRPjmvAcbsl7wlLUYck5FTD7WkQrY52PKyiwOn2s=.03996b48-829b-4048-925d-04943736fed1@github.com> Message-ID: On Thu, 18 Dec 2025 08:21:51 GMT, Hannes Greule wrote: >> @merykitty Why not file an RFE with a reproducer for incremental inlining? >> >> And yes, I totally agree it would be nicer if things were organized better! > > @eme64 GHA passed, could you run testing? Thanks! @SirYwell sorry, but you should probably pull in master once more now that [https://github.com/openjdk/jdk/pull/26143 is merged. [IntegerDivValueTests](https://github.com/openjdk/jdk/blob/2ba423db9925355348106fc9fcf84450123d2605/test/hotspot/jtreg/compiler/igvn/IntegerDivValueTests.java#L284) has some IR rules that should be enabled with this PR (as far as I can tell). ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3669897241 From galder at openjdk.org Thu Dec 18 11:50:20 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 18 Dec 2025 11:50:20 GMT Subject: RFR: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations [v4] In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 07:03:48 GMT, Emanuel Peter wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix redundant variable > > Testing passed, ship it ? ! Thanks @eme64 and @rwestrel for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28770#issuecomment-3669901995 From galder at openjdk.org Thu Dec 18 11:50:22 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 18 Dec 2025 11:50:22 GMT Subject: Integrated: 8373396: Min and Max Ideal missing AddNode::Ideal optimisations In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 16:02:24 GMT, Galder Zamarre?o wrote: > `MaxI` and `MinI` are missing `AddNode::Ideal` optimizations. These optimizations include commutation, flattening, pushing constants...etc. The PR changes `MaxINode::Ideal` and `MinINode::Ideal` to call `AddNode::Ideal`. Long versions already call `AddNode::Ideal` so nothing to change there. > > The PR also includes a template framework generated test (cc @eme64) that verifies that all of `AddNode::Ideal` optimizations now apply correctly for min/max for longs and ints. Long tests have been added to validate that both ints and longs produce the same results. > > Fixing this issue indirectly fixes `compiler/codegen/TestBooleanVect.java` when run with `-XX:VerifyIterativeGVN=1110`, which was failing due to `min` not having one of those optimisations. However, this PR does not make changes to `PhaseIterGVN::verify_Identity_for` because there are additional failures observed with min/max for integers in JDK-8373134. Therefore, changes there will in the PR for JDK-8373134 instead. > > **Update 15.12.25**: `PhaseIterGVN::verify_Ideal_for` exceptions for MinI/MaxI are still needed. > > ~If you look at `PhaseIterGVN::verify_Ideal_for`, it contains. This looks like it could be removed in this PR as it looks like they were quite likely disabled due to the issue here. However, it's unclear what test was failing here (@eme64 ?):~ > > > // MinINode::Ideal > // Did not investigate, but there are some patterns that might > // need more notification. > case Op_MinI: > case Op_MaxI: // preemptively removed it as well. > return false; > > > I've run tier1-3 tests on linux/x64 and they passed. This pull request has now been integrated. Changeset: 2c0d9a79 Author: Galder Zamarre?o Committer: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/2c0d9a79b8197d88a104bd77026dd45b83a11f8a Stats: 144 lines in 2 files changed: 144 ins; 0 del; 0 mod 8373396: Min and Max Ideal missing AddNode::Ideal optimisations Reviewed-by: epeter, roland ------------- PR: https://git.openjdk.org/jdk/pull/28770 From qamai at openjdk.org Thu Dec 18 11:51:41 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 Dec 2025 11:51:41 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations Message-ID: Hi, This PR improves the implementation of `AddNode/SubNode::Value` by taking advantage of the additional information in `TypeInt`. The implementation has some pretty non-trivial logic. Fortunately, the test infrastructure is already there. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - Improve Add/SubNode::Value with unsigned bounds and known bits Changes: https://git.openjdk.org/jdk/pull/28897/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28897&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373999 Stats: 309 lines in 4 files changed: 235 ins; 66 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/28897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28897/head:pull/28897 PR: https://git.openjdk.org/jdk/pull/28897 From mli at openjdk.org Thu Dec 18 11:54:59 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 18 Dec 2025 11:54:59 GMT Subject: RFR: 8373998: RISC-V: simple optimization of ConvHF2F Message-ID: Hi, Can you help to review this patch? ConvHF2F could be optimized by following patch. As riscv does not have the restriction to use `iRegINoSp` in src register, it can use iRegIorL2I instead. Check other usages of src register in riscv.ad file. @RealFYang Thanks! Thanks! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/28896/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28896&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373998 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28896/head:pull/28896 PR: https://git.openjdk.org/jdk/pull/28896 From bmaillard at openjdk.org Thu Dec 18 12:08:56 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 18 Dec 2025 12:08:56 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v6] In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
> Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... Beno?t Maillard has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Remove trailing whitespace - Merge branch 'master' into JDK-8371536 - More style - Style and comments - Merge branch 'master' into JDK-8371536 - Assert directly in the verify methods - Add comment for _table.hash_delete(n) - Change assert to print only the cause and the node name Bring back old comment Wording - Assert at first failure - Remove node from hash table before calling Ideal in verification ------------- Changes: https://git.openjdk.org/jdk/pull/28295/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=05 Stats: 122 lines in 2 files changed: 26 ins; 7 del; 89 mod Patch: https://git.openjdk.org/jdk/pull/28295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28295/head:pull/28295 PR: https://git.openjdk.org/jdk/pull/28295 From galder at openjdk.org Thu Dec 18 12:12:19 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 18 Dec 2025 12:12:19 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations [v4] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 13:32:25 GMT, Emanuel Peter wrote: >> Looks good to me, nice work! I only have one question. > > @benoitmaillard @galderz @TobiHartmann Thanks for the reviews! Great timing @eme64 with this, the #28895 PR uses the work here :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28095#issuecomment-3669976973 From bmaillard at openjdk.org Thu Dec 18 12:23:52 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 18 Dec 2025 12:23:52 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v6] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: On Fri, 14 Nov 2025 07:21:44 GMT, Emanuel Peter wrote: >> Beno?t Maillard has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Remove trailing whitespace >> - Merge branch 'master' into JDK-8371536 >> - More style >> - Style and comments >> - Merge branch 'master' into JDK-8371536 >> - Assert directly in the verify methods >> - Add comment for _table.hash_delete(n) >> - Change assert to print only the cause and the node name >> >> Bring back old comment >> >> Wording >> - Assert at first failure >> - Remove node from hash table before calling Ideal in verification > > @benoitmaillard Thanks for working on this, it will be really helpful for triaging :) I have changed the code so that we assert directly in the verify methods, and merged with other changes that were carried out in the meantime. Let me know what you think @eme64 @chhagedorn. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28295#issuecomment-3670025367 From bkilambi at openjdk.org Thu Dec 18 12:44:53 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 18 Dec 2025 12:44:53 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v7] In-Reply-To: References: Message-ID: <5NNS6uN_GhivBLv-OhE1Tn0HJhrL6-TKlsN9wWKOytU=.d36bafc2-a5c4-4878-920c-6707cf3547bb@github.com> On Tue, 16 Dec 2025 15:45:15 GMT, Jatin Bhateja wrote: >>> I suggest creating a seperate PR for this ? You can either create a smaller standalone reproducer testcase or mention about the tests part of this PR. >> >> Hi @jbhateja, thanks for the suggestion. Based on the comments here - https://bugs.openjdk.org/browse/JDK-8373574, is it ok if my fix (along with a regression test as suggested) be part of this PR? > >> > I suggest creating a seperate PR for this ? You can either create a smaller standalone reproducer testcase or mention about the tests part of this PR. >> >> Hi @jbhateja, thanks for the suggestion. Based on the comments here - https://bugs.openjdk.org/browse/JDK-8373574, is it ok if my fix (along with a regression test as suggested) be part of this PR? > > Hi @Bhavana-Kilambi , As @TobiHartmann suggested we can included your patch with the PR. > Best Regards Thanks @jatin-bhateja for including AArch64 changes to the patch. Also @TobiHartmann suggested (here - https://bugs.openjdk.org/browse/JDK-8373574) that it'd be good to have a regression test for the AArch64 failures. I have put together a small JTREG test taking one of the VectorAPI tests from `TestFloat16VectorOperations.java` introduced in this PR (which was the one failing on AArch64). Could you please include the testcase as well? The patch is attached here [test.patch](https://github.com/user-attachments/files/24235556/test.patch) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3670100491 From mablakatov at openjdk.org Thu Dec 18 13:24:37 2025 From: mablakatov at openjdk.org (Mikhail Ablakatov) Date: Thu, 18 Dec 2025 13:24:37 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 08:28:47 GMT, Aleksey Shipilev wrote: >> Overall, looks good to me. Nice work, Aleksey! >> >> I'm curious how performance-sensitive that part of code is. Does it make sense to try to further optimize it? >> >> For example: >> - 2 slots is the most common case; any benefits from optimizing specifically for it (e.g., unroll the loops)? >> - fast path can be further optimized for no nulls case by offloading more work on found_null slow path [1] >> >> [1] >> >> // Fastest: receiver is already installed >> int i = 0; >> for (; i < receiver_count(); i++) { >> if (receiver(i) == recv) goto found_recv(i); >> if (receiver(i) == null) goto found_null(i); >> } >> >> goto polymorphic >> >> // Slow: try to install receiver >> found_null(i): >> // Finish the search >> for (int j = i ; j < receiver_count(); j++) { >> if (receiver(j) == recv) goto found_recv(j); >> } >> CAS(&receiver(i), null, recv); >> goto restart >> ... > >> I'm curious how performance-sensitive that part of code is. Does it make sense to try to further optimize it? > > This is about 5-th-ish version of this code, so I don't think there is more juice to squeeze out of it. The current version is more or less optimal. The stratification into three cases looks the best performing overall. > >> fast path can be further optimized for no nulls case by offloading more work on found_null slow path [1] > > Yeah, but putting checks for both installed receiver and nullptr slot turns out hurting performance; this is bad even without extra control flow. Two separate loops are more efficient, even for small number of iterations. It also helpfully optimizes for the best case, when profile is smaller than `TypeProfileWidth`, which is what we want. > >> 2 slots is the most common case; any benefits from optimizing specifically for it (e.g., unroll the loops)? > > I don't think it is worth the extra complexity, honestly. The loop-y code in current version is still a significant code density win over the decision-tree (unrolled, effectively) approach we are doing currently. Keeping this thing simple means more reliability and less testing surface, plus much less headache to port to other architectures. > > Note that the goal for this work is to _improve profiling reliability_ without hopefully ceding too much ground in code density and performance. When I started out, it was not clear if it is doable, given the need for atomics; but it looks doable indeed. So I think we should call this thing done and move on to solving the actual performance problem in this code: the contention on counter updates. Hi @shipilev , are you aware of anyone working on or planning to implement the same for AArch64 by any chance? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3670273806 From qamai at openjdk.org Thu Dec 18 13:50:38 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 Dec 2025 13:50:38 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v2] In-Reply-To: References: Message-ID: > Hi, > > This PR improves the implementation of `AddNode/SubNode::Value` by taking advantage of the additional information in `TypeInt`. The implementation has some pretty non-trivial logic. Fortunately, the test infrastructure is already there. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: include order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28897/files - new: https://git.openjdk.org/jdk/pull/28897/files/a07bb5e1..f910e70b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28897&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28897&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28897/head:pull/28897 PR: https://git.openjdk.org/jdk/pull/28897 From epeter at openjdk.org Thu Dec 18 13:56:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Dec 2025 13:56:39 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v6] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: On Thu, 18 Dec 2025 12:08:56 GMT, Beno?t Maillard wrote: >> This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. >> >> In summary, this PR brings the following changes: >> - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. >> - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. >> - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. >> >> ### Example outputs >> #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) >> Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >>
>> Before >> >> >> Missed Ideal optimization (can_reshape=false): >> The node was replaced by Ideal. >> Old node: >> dist dump >> --------------------------------------------- >> 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 >> 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) >> 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) >> The result after Ideal: >> dist dump >> --------------------------------------------- >> 1 337 ConL === 0 [[ 338 ]] #long:-9 >> 1 336 URShiftL === _ 298 22 [[ 338 ]] >> 0 338 AndL === _ 336 337 [[ ]] >> >> >> Missed Ideal optimization (can_reshape=true): >> The node was replaced by Ideal. >> Old node: >> dist dump >> --------------------------------------------- >> 1 22 ConI === 0 [[ 70 81 70 290 81 76... > > Beno?t Maillard has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Remove trailing whitespace > - Merge branch 'master' into JDK-8371536 > - More style > - Style and comments > - Merge branch 'master' into JDK-8371536 > - Assert directly in the verify methods > - Add comment for _table.hash_delete(n) > - Change assert to print only the cause and the node name > > Bring back old comment > > Wording > - Assert at first failure > - Remove node from hash table before calling Ideal in verification Changes requested by epeter (Reviewer). src/hotspot/share/opto/phaseX.cpp line 1090: > 1088: // We should either make sure that this node is properly added back to the IGVN worklist > 1089: // in PhaseIterGVN::add_users_to_worklist to update it again or add an exception > 1090: // in the verification code above if that is not possible for some reason (like Load nodes). You moved the comment. It used to make sense to refer to "exceptions above". Now the exceptions are inside the methods, relative to the new place of the comment ;) src/hotspot/share/opto/phaseX.cpp line 1202: > 1200: tty->print_cr("%s", ss.as_string()); > 1201: > 1202: assert(false, "Missed Value optimization opportunity in PhaseIterGVN for %s", n->Name()); What if it gets called during CCP? Then it is not just a missed opportunity, but possibly a correctness problem. I wonder if we should have different assert messages here. We could even just pass a string into the method, either `IGVN` or `CCP`. What do you think? src/hotspot/share/opto/phaseX.cpp line 1839: > 1837: // return and finally hit the assert in PhaseIterGVN::verify_optimize to get > 1838: // a more meaningful message > 1839: _table.hash_delete(n); Looks like an unrelated change. Why are you adding it now? ------------- PR Review: https://git.openjdk.org/jdk/pull/28295#pullrequestreview-3593092023 PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2631174105 PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2631192720 PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2631179338 From mhaessig at openjdk.org Thu Dec 18 14:53:45 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 18 Dec 2025 14:53:45 GMT Subject: RFR: 8373524: C2: no reachable node should have no use [v3] In-Reply-To: <8xp47-406XvQpBbyImEKvgKJGoAR_GLX9OorBcSEXJU=.95d37fce-8278-4408-90fc-0cc3997068a3@github.com> References: <8xp47-406XvQpBbyImEKvgKJGoAR_GLX9OorBcSEXJU=.95d37fce-8278-4408-90fc-0cc3997068a3@github.com> Message-ID: On Wed, 17 Dec 2025 14:16:29 GMT, Roland Westrelin wrote: >> The failure occurs because `PhiNode::Ideal` uses `set_req` to update >> an input of a `Phi`. That causes the previous input to be disconnected >> but because of the use of `set_req`, the previous input that has no >> use is not enqueued for `igvn` to be reclaimed. The fix is to use >> `set_req_X` instead. I replaced uses of `set_req` with `set_req_X` in >> `PhiNode::Ideal` where I thought it made sense. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Testing passed and the new changes look good as well. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28841#pullrequestreview-3593401240 From galder at openjdk.org Thu Dec 18 15:05:08 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 18 Dec 2025 15:05:08 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v2] In-Reply-To: References: Message-ID: > Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. > > Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. > > I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. > > To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). > > During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. > > I've run tier1-3 tests on linux/x64 successfully. Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Merge branch 'master' into topic.uses-min-max - Test Float16 - Only apply to uses that match original IR node - Merge branch 'master' into topic.uses-min-max - Use is_MinMax() instead of spelling out individual Min/Max opcodes - Refactor MaxNode to MinMaxNode and add is_MinMax() query - Add max(a, max(b, c)) patterns to add users of use - Add templated test - Remove exclude or Min/Max in verify identity ------------- Changes: https://git.openjdk.org/jdk/pull/28895/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=01 Stats: 268 lines in 9 files changed: 196 ins; 21 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/28895.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28895/head:pull/28895 PR: https://git.openjdk.org/jdk/pull/28895 From shade at openjdk.org Thu Dec 18 15:27:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 18 Dec 2025 15:27:01 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: On Wed, 10 Dec 2025 08:28:47 GMT, Aleksey Shipilev wrote: >> Overall, looks good to me. Nice work, Aleksey! >> >> I'm curious how performance-sensitive that part of code is. Does it make sense to try to further optimize it? >> >> For example: >> - 2 slots is the most common case; any benefits from optimizing specifically for it (e.g., unroll the loops)? >> - fast path can be further optimized for no nulls case by offloading more work on found_null slow path [1] >> >> [1] >> >> // Fastest: receiver is already installed >> int i = 0; >> for (; i < receiver_count(); i++) { >> if (receiver(i) == recv) goto found_recv(i); >> if (receiver(i) == null) goto found_null(i); >> } >> >> goto polymorphic >> >> // Slow: try to install receiver >> found_null(i): >> // Finish the search >> for (int j = i ; j < receiver_count(); j++) { >> if (receiver(j) == recv) goto found_recv(j); >> } >> CAS(&receiver(i), null, recv); >> goto restart >> ... > >> I'm curious how performance-sensitive that part of code is. Does it make sense to try to further optimize it? > > This is about 5-th-ish version of this code, so I don't think there is more juice to squeeze out of it. The current version is more or less optimal. The stratification into three cases looks the best performing overall. > >> fast path can be further optimized for no nulls case by offloading more work on found_null slow path [1] > > Yeah, but putting checks for both installed receiver and nullptr slot turns out hurting performance; this is bad even without extra control flow. Two separate loops are more efficient, even for small number of iterations. It also helpfully optimizes for the best case, when profile is smaller than `TypeProfileWidth`, which is what we want. > >> 2 slots is the most common case; any benefits from optimizing specifically for it (e.g., unroll the loops)? > > I don't think it is worth the extra complexity, honestly. The loop-y code in current version is still a significant code density win over the decision-tree (unrolled, effectively) approach we are doing currently. Keeping this thing simple means more reliability and less testing surface, plus much less headache to port to other architectures. > > Note that the goal for this work is to _improve profiling reliability_ without hopefully ceding too much ground in code density and performance. When I started out, it was not clear if it is doable, given the need for atomics; but it looks doable indeed. So I think we should call this thing done and move on to solving the actual performance problem in this code: the contention on counter updates. > Hi @shipilev , are you aware of anyone working on or planning to implement the same for AArch64 by any chance? I'll task one of our folks to do it after NY break. Speaking of, I will integrate this one after NY break as well, to avoid dealing with any possible fallout during the holidays. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3670834495 From roland at openjdk.org Thu Dec 18 15:38:42 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 18 Dec 2025 15:38:42 GMT Subject: RFR: 8373524: C2: no reachable node should have no use [v3] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 08:38:25 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Otherwise, looks good! @chhagedorn @mhaessig thanks for the reviews and testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28841#issuecomment-3670884726 From bmaillard at openjdk.org Thu Dec 18 15:43:57 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Thu, 18 Dec 2025 15:43:57 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v6] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: On Thu, 18 Dec 2025 13:49:37 GMT, Emanuel Peter wrote: >> Beno?t Maillard has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Remove trailing whitespace >> - Merge branch 'master' into JDK-8371536 >> - More style >> - Style and comments >> - Merge branch 'master' into JDK-8371536 >> - Assert directly in the verify methods >> - Add comment for _table.hash_delete(n) >> - Change assert to print only the cause and the node name >> >> Bring back old comment >> >> Wording >> - Assert at first failure >> - Remove node from hash table before calling Ideal in verification > > src/hotspot/share/opto/phaseX.cpp line 1839: > >> 1837: // return and finally hit the assert in PhaseIterGVN::verify_optimize to get >> 1838: // a more meaningful message >> 1839: _table.hash_delete(n); > > Looks like an unrelated change. Why are you adding it now? This is here from the first version of the PR. From the initial PR descriptiom: > In summary, this PR brings the following changes: > ... > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. I think it relevant to have it here, don't you think so? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2631594884 From galder at openjdk.org Thu Dec 18 15:52:06 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 18 Dec 2025 15:52:06 GMT Subject: RFR: 8373555: C2: Optimize redundant input calculations for sign comparisons In-Reply-To: References: Message-ID: On Fri, 12 Dec 2025 08:14:28 GMT, Hannes Greule wrote: > Instead of sign-comparisons with And,Or,Xor,Max,Min nodes, we can directly compare to one of the inputs of the binary nodes if the other input is irrelevant to the comparison. > > There are potentially more operations, but these mentioned here are the most obvious ones. Max and Min could theoretically be expanded to arbitrary comparisons to constants, but I didn't want to introduce more complexity for now. > > Please let me know what you think :) Neat! At a glance I don't see anything wrong. Just a small question: what testing did you carry out? ------------- PR Review: https://git.openjdk.org/jdk/pull/28782#pullrequestreview-3593681301 From liach at openjdk.org Thu Dec 18 16:28:36 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 18 Dec 2025 16:28:36 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v7] In-Reply-To: References: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> Message-ID: On Thu, 18 Dec 2025 04:58:38 GMT, Quan Anh Mai wrote: > So my question is why this annotation does not try to enforce a stronger invariant so that we can benefit from those invariants without having to wait for strict fields. No. We currently cannot enforce such final fields to be all written before the `Object::` entry, and I also don't think mainline has this safe publication fence at the beginning of `Object::` either. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3671092060 From epeter at openjdk.org Thu Dec 18 16:43:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Dec 2025 16:43:33 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v6] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: On Thu, 18 Dec 2025 15:41:22 GMT, Beno?t Maillard wrote: >> src/hotspot/share/opto/phaseX.cpp line 1839: >> >>> 1837: // return and finally hit the assert in PhaseIterGVN::verify_optimize to get >>> 1838: // a more meaningful message >>> 1839: _table.hash_delete(n); >> >> Looks like an unrelated change. Why are you adding it now? > > This is here from the first version of the PR. From the initial PR descriptiom: > >> In summary, this PR brings the following changes: >> ... >> - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > > I think it's relevant to have it here, don't you think so? Ok, sure. Sounds good to me :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2631798421 From qamai at openjdk.org Thu Dec 18 16:49:25 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 Dec 2025 16:49:25 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v7] In-Reply-To: References: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> Message-ID: On Thu, 18 Dec 2025 16:25:29 GMT, Chen Liang wrote: >> What I mean by stronger is that trusted final fields only ensure that their values are unchanged after initialization. Strict fields are unchanged unconditionally, there is only 1 observable state for a strict field of an object. As a result, in addition to constant folding, we can do load hoisting, too. So my question is why this annotation does not try to enforce a stronger invariant so that we can benefit from those invariants without having to wait for strict fields. > >> So my question is why this annotation does not try to enforce a stronger invariant so that we can benefit from those invariants without having to wait for strict fields. > > No. We currently cannot enforce such final fields to be all written before the `Object::` entry, and I also don't think mainline has this safe publication fence at the beginning of `Object::` either. @liach I don't think we need such a condition, we only need to ensure that the fields are not read from and the object does not escape to memory before the termination of ``. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3671176336 From liach at openjdk.org Thu Dec 18 18:12:59 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 18 Dec 2025 18:12:59 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v7] In-Reply-To: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> References: <56pJKejOYp59xiAZ_0iAKzBpGnU341_0o5Dhy53jx_0=.d331165a-ae13-4ae2-923d-302d3bddccfd@github.com> Message-ID: On Thu, 18 Dec 2025 00:03:10 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Move the test to a core library purposed directory We can port JDK-8354068 to mainline, but I think this is better done as a separate effort from the introduction of this annotation. This patch is more focused on the jdk implications. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3671501119 From dlong at openjdk.org Thu Dec 18 23:19:48 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 18 Dec 2025 23:19:48 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v2] In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 15:05:08 GMT, Galder Zamarre?o wrote: >> Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. >> >> Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. >> >> I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. >> >> To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). >> >> During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. >> >> I've run tier1-3 tests on linux/x64 successfully. > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Merge branch 'master' into topic.uses-min-max > - Test Float16 > - Only apply to uses that match original IR node > - Merge branch 'master' into topic.uses-min-max > - Use is_MinMax() instead of spelling out individual Min/Max opcodes > - Refactor MaxNode to MinMaxNode and add is_MinMax() query > - Add max(a, max(b, c)) patterns to add users of use > - Add templated test > - Remove exclude or Min/Max in verify identity src/hotspot/share/opto/phaseX.cpp line 2609: > 2607: for (DUIterator_Fast i2max, i2 = use->fast_outs(i2max); i2 < i2max; i2++) { > 2608: Node* u = use->fast_out(i2); > 2609: if (u->Opcode() == use->Opcode()) { So there are no Min(Max()) or Max(Min()) patterns we need to worry about? I was expecting this line to be if (u->is_MinMax()) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28895#discussion_r2632963753 From fyang at openjdk.org Fri Dec 19 00:52:53 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 19 Dec 2025 00:52:53 GMT Subject: RFR: 8373998: RISC-V: simple optimization of ConvHF2F In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 11:41:13 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > ConvHF2F could be optimized by following patch. > As riscv does not have the restriction to use `iRegINoSp` in src register, it can use iRegIorL2I instead. > Check other usages of src register in riscv.ad file. > > @RealFYang Thanks! > > Thanks! Thanks. My local hs:tier1 - hs:tier2 test with `Zfh` is good using fastdebug build. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28896#pullrequestreview-3595932722 From dholmes at openjdk.org Fri Dec 19 00:56:08 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 19 Dec 2025 00:56:08 GMT Subject: RFR: 8373630: r18_tls should not be modified on Windows AArch64 In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 20:44:31 GMT, Saint Wesonga wrote: > Yes, I'm preparing the jdk26u backport this afternoon You want to backport to the jdk26 branch not the 26u repo. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28808#issuecomment-3672944683 From dzhang at openjdk.org Fri Dec 19 01:17:56 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Fri, 19 Dec 2025 01:17:56 GMT Subject: RFR: 8373998: RISC-V: simple optimization of ConvHF2F In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 11:41:13 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > ConvHF2F could be optimized by following patch. > As riscv does not have the restriction to use `iRegINoSp` in src register, it can use iRegIorL2I instead. > Check other usages of src register in riscv.ad file. > > @RealFYang Thanks! > > Thanks! LGTM, thanks! ------------- Marked as reviewed by dzhang (Committer). PR Review: https://git.openjdk.org/jdk/pull/28896#pullrequestreview-3596004108 From qamai at openjdk.org Fri Dec 19 04:07:38 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 19 Dec 2025 04:07:38 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition Message-ID: Hi, This PR fixes the incorrect early return condition in `C2_MacroAssembler::verify_int_in_range`. Previously, `lo == min_jint && hi == max_jint` is equivalent to `t == Type::INT`. But this is not true anymore. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - Fix verify Changes: https://git.openjdk.org/jdk/pull/28916/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28916&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374078 Stats: 18 lines in 2 files changed: 6 ins; 8 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28916/head:pull/28916 PR: https://git.openjdk.org/jdk/pull/28916 From dlong at openjdk.org Fri Dec 19 05:10:53 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 19 Dec 2025 05:10:53 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v2] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 14:04:04 GMT, Roland Westrelin wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - review > - review > - merge > - more > - more > - more > - undo > - exps Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28769#pullrequestreview-3596852846 From qamai at openjdk.org Fri Dec 19 05:50:56 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 19 Dec 2025 05:50:56 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v2] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 14:04:04 GMT, Roland Westrelin wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - review > - review > - merge > - more > - more > - more > - undo > - exps Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28769#pullrequestreview-3596937444 From jbhateja at openjdk.org Fri Dec 19 06:29:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 19 Dec 2025 06:29:57 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v9] In-Reply-To: References: <8hStIcvp252Ik7raxZL5BvFKKkXTflorjyOD9Cyakvc=.c5d1b302-5c49-46b1-91ba-2feda2e6a746@github.com> Message-ID: On Thu, 13 Nov 2025 19:47:52 GMT, Paul Sandoz wrote: >>> The basic type codes are declared and shared across Java and HotSpot - it's used in `LaneType`. Can we pass a single argument that is the basic type instead of two arguments. HotSpot should know from the basic type what the carrier class and also what the operation type without it being explicitly told, since presumably it knew the inverse - the basic type from the element class. >> >> Hi @PaulSandoz, T_HALFFLOAT used in LaneType is mainly used for differentiation of various cache keys used by conversion operation lookups. In principle, we can extend VM to acknowledge this new custom basic type on the lines of T_METADATA / T_ADDRESS; its scope for now will be restricted to VectorSupport. We can gradually expose this to C2 type, such that TypeVect for all Float16 VectorIR uses T_HALFFLOAT as its basic type; currently, we use T_SHORT as the lane type. Let me know if this looks reasonable > >> > The basic type codes are declared and shared across Java and HotSpot - it's used in `LaneType`. Can we pass a single argument that is the basic type instead of two arguments. HotSpot should know from the basic type what the carrier class and also what the operation type without it being explicitly told, since presumably it knew the inverse - the basic type from the element class. >> >> Hi @PaulSandoz, T_HALFFLOAT used in LaneType is mainly used for differentiation of various cache keys used by conversion operation lookups. In principle, we can extend VM to acknowledge this new custom basic type on the lines of T_METADATA / T_ADDRESS; its scope for now will be restricted to VectorSupport. We can gradually expose this to C2 type, such that TypeVect for all Float16 VectorIR uses T_HALFFLOAT as its basic type; currently, we use T_SHORT as the lane type. Let me know if this looks reasonable > > I am proposing something simpler, really as a temporary step until `Float16` becomes part of the `java.base` module. IIUC from the basic type we can reliably determine what the two arguments we currently passing are e.g., T_HALFFLOAT = { short.class, VECTOR_TYPE_FP16 }. So we don't need to pass two arguments, we can just pass one, the intrinsic can lookup the class and operation type kind. Hi @PaulSandoz , your comments have been addressed. Please let us know if there are other comments. Hi @eme64 , Kindly share your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3673727967 From epeter at openjdk.org Fri Dec 19 07:02:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 Dec 2025 07:02:02 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v4] In-Reply-To: References: Message-ID: <9Jd4_uwXvRVqQ9rzhLTRvBtFWDt35xESu5PUle7PVKo=.90848c08-6c7e-412f-b46c-febe9c2872d5@github.com> On Mon, 15 Dec 2025 09:33:19 GMT, Hannes Greule wrote: >> The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into delay-divmod-idealization > - review > - expand comments > - delay integral Div/Mod Ideal() until IGVN > - test We are getting some failures for this test, both on `x64` and `aarch64`: `compiler/igvn/IntegerDivValueTests.java` Compilation of Failed Method ---------------------------- 1) Compilation of "public long compiler.igvn.IntegerDivValueTests.testLongRange(long)": > Phase "PrintIdeal": AFTER: print_ideal 0 Root === 0 40 [[ 0 1 3 39 ]] inner 1 Con === 0 [[ ]] #top 3 Start === 3 0 [[ 3 5 6 7 8 9 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:instptr:compiler/igvn/IntegerDivValueTests:NotNull+0,iid=bot, 6:long, 7:half} 5 Parm === 3 [[ 40 ]] Control !jvms: IntegerDivValueTests::testLongRange @ bci:-1 (line 306) 6 Parm === 3 [[ 40 ]] I_O !jvms: IntegerDivValueTests::testLongRange @ bci:-1 (line 306) 7 Parm === 3 [[ 40 ]] Memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: IntegerDivValueTests::testLongRange @ bci:-1 (line 306) 8 Parm === 3 [[ 40 ]] FramePtr !jvms: IntegerDivValueTests::testLongRange @ bci:-1 (line 306) 9 Parm === 3 [[ 40 ]] ReturnAdr !jvms: IntegerDivValueTests::testLongRange @ bci:-1 (line 306) 39 ConL === 0 [[ 40 ]] #long:1 40 Return === 5 6 7 8 9 returns 39 [[ 0 ]] Failed IR Rules (1) of Methods (1) ---------------------------------- 1) Method "public long compiler.igvn.IntegerDivValueTests.testLongRange(long)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#RSHIFT_L#_", "> 0", "_#ADD_L#_", "> 0", "_#AND_L#_", "> 0"}, failOn={"_#DIV#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(RShiftL.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(AddL.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! * Constraint 3: "(\\d+(\\s){2}(AndL.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3673837311 From epeter at openjdk.org Fri Dec 19 07:03:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 Dec 2025 07:03:58 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v9] In-Reply-To: References: <8hStIcvp252Ik7raxZL5BvFKKkXTflorjyOD9Cyakvc=.c5d1b302-5c49-46b1-91ba-2feda2e6a746@github.com> Message-ID: <1ElN5XvEXAYGINpCIB2smhDrzekGyiXmG6o8-jnxDxk=.83a69a64-2894-40af-a2ee-9c35448c88b2@github.com> On Fri, 19 Dec 2025 06:26:47 GMT, Jatin Bhateja wrote: >>> > The basic type codes are declared and shared across Java and HotSpot - it's used in `LaneType`. Can we pass a single argument that is the basic type instead of two arguments. HotSpot should know from the basic type what the carrier class and also what the operation type without it being explicitly told, since presumably it knew the inverse - the basic type from the element class. >>> >>> Hi @PaulSandoz, T_HALFFLOAT used in LaneType is mainly used for differentiation of various cache keys used by conversion operation lookups. In principle, we can extend VM to acknowledge this new custom basic type on the lines of T_METADATA / T_ADDRESS; its scope for now will be restricted to VectorSupport. We can gradually expose this to C2 type, such that TypeVect for all Float16 VectorIR uses T_HALFFLOAT as its basic type; currently, we use T_SHORT as the lane type. Let me know if this looks reasonable >> >> I am proposing something simpler, really as a temporary step until `Float16` becomes part of the `java.base` module. IIUC from the basic type we can reliably determine what the two arguments we currently passing are e.g., T_HALFFLOAT = { short.class, VECTOR_TYPE_FP16 }. So we don't need to pass two arguments, we can just pass one, the intrinsic can lookup the class and operation type kind. > > Hi @PaulSandoz , your comments have been addressed. Please let us know if there are other comments. > Hi @eme64 , Kindly share your comments. @jatin-bhateja Thanks for the ping! I'll put this on the list for review early in 2026 :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3673842243 From hgreule at openjdk.org Fri Dec 19 07:34:01 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 19 Dec 2025 07:34:01 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v4] In-Reply-To: <9Jd4_uwXvRVqQ9rzhLTRvBtFWDt35xESu5PUle7PVKo=.90848c08-6c7e-412f-b46c-febe9c2872d5@github.com> References: <9Jd4_uwXvRVqQ9rzhLTRvBtFWDt35xESu5PUle7PVKo=.90848c08-6c7e-412f-b46c-febe9c2872d5@github.com> Message-ID: On Fri, 19 Dec 2025 06:59:07 GMT, Emanuel Peter wrote: >> Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into delay-divmod-idealization >> - review >> - expand comments >> - delay integral Div/Mod Ideal() until IGVN >> - test > > We are getting some failures for this test, both on `x64` and `aarch64`: > `compiler/igvn/IntegerDivValueTests.java` > > > Compilation of Failed Method > ---------------------------- > 1) Compilation of "public long compiler.igvn.IntegerDivValueTests.testLongRange(long)": >> Phase "PrintIdeal": > AFTER: print_ideal > 0 Root === 0 40 [[ 0 1 3 39 ]] inner > 1 Con === 0 [[ ]] #top > 3 Start === 3 0 [[ 3 5 6 7 8 9 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:instptr:compiler/igvn/IntegerDivValueTests:NotNull+0,iid=bot, 6:long, 7:half} > 5 Parm === 3 [[ 40 ]] Control !jvms: IntegerDivValueTests::testLongRange @ bci:-1 (line 306) > 6 Parm === 3 [[ 40 ]] I_O !jvms: IntegerDivValueTests::testLongRange @ bci:-1 (line 306) > 7 Parm === 3 [[ 40 ]] Memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: IntegerDivValueTests::testLongRange @ bci:-1 (line 306) > 8 Parm === 3 [[ 40 ]] FramePtr !jvms: IntegerDivValueTests::testLongRange @ bci:-1 (line 306) > 9 Parm === 3 [[ 40 ]] ReturnAdr !jvms: IntegerDivValueTests::testLongRange @ bci:-1 (line 306) > 39 ConL === 0 [[ 40 ]] #long:1 > 40 Return === 5 6 7 8 9 returns 39 [[ 0 ]] > > > Failed IR Rules (1) of Methods (1) > ---------------------------------- > 1) Method "public long compiler.igvn.IntegerDivValueTests.testLongRange(long)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#RSHIFT_L#_", "> 0", "_#ADD_L#_", "> 0", "_#AND_L#_", "> 0"}, failOn={"_#DIV#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\\d+(\\s){2}(RShiftL.*)+(\\s){2}===.*)" > - Failed comparison: [found] 0 > 0 [given] > - No nodes matched! > * Constraint 2: "(\\d+(\\s){2}(AddL.*)+(\\s){2}===.*)" > - Failed comparison: [found] 0 > 0 [given] > - No nodes matched! > * Constraint 3: "(\\d+(\\s){2}(AndL.*)+(\\s){2}===.*)" > - Failed comparison: [found] 0 > 0 [given] > - No nodes matched! Thanks @eme64, I guess that's what @ichttt mentioned. I'll merge master again and adjust the tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27886#issuecomment-3673926084 From duke at openjdk.org Fri Dec 19 08:08:36 2025 From: duke at openjdk.org (Shawn M Emery) Date: Fri, 19 Dec 2025 08:08:36 GMT Subject: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 Message-ID: This change allows use of the AVX512_VBMI/VMBI2 instruction set to further optimize decompression/parsing of polynomial coefficients for ML-KEM. The speedup gained in the ML-KEM benchmarks for key generation is between 0.2 to 0.5%, encapsulation is 0.3 to 1.5%, and decapsulation is 0 to 0.9%. Thank you to @sviswa7 and @ferakocz for their help in working through the early stages of this code with me. ------------- Commit messages: - Merge with mainline - Swap parameter operation with source - Remove wrong mask from evpsrlvw - Reverse ordering for vpermb and vpsrlvw instructions - Switch from vpshldvw to vpsrlvw - Fix whitespaces - 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 Changes: https://git.openjdk.org/jdk/pull/28815/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28815&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8360934 Stats: 88 lines in 1 file changed: 87 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28815.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28815/head:pull/28815 PR: https://git.openjdk.org/jdk/pull/28815 From hgreule at openjdk.org Fri Dec 19 08:11:06 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 19 Dec 2025 08:11:06 GMT Subject: RFR: 8366815: C2: Delay Mod/Div by constant transformation [v5] In-Reply-To: References: Message-ID: > The test cases show examples of code where `Value()` previously wasn't run because idealization took place before, resulting in less precise type analysis. > > Please let me know what you think. Hannes Greule has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - use the right IR checks - Merge branch 'master' into delay-divmod-idealization - Merge branch 'master' into delay-divmod-idealization - review - expand comments - delay integral Div/Mod Ideal() until IGVN - test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27886/files - new: https://git.openjdk.org/jdk/pull/27886/files/d9f8a698..db8fd790 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27886&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27886&range=03-04 Stats: 11104 lines in 450 files changed: 7275 ins; 1527 del; 2302 mod Patch: https://git.openjdk.org/jdk/pull/27886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27886/head:pull/27886 PR: https://git.openjdk.org/jdk/pull/27886 From chagedorn at openjdk.org Fri Dec 19 08:19:35 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 19 Dec 2025 08:19:35 GMT Subject: RFR: 8373524: C2: no reachable node should have no use [v3] In-Reply-To: <8xp47-406XvQpBbyImEKvgKJGoAR_GLX9OorBcSEXJU=.95d37fce-8278-4408-90fc-0cc3997068a3@github.com> References: <8xp47-406XvQpBbyImEKvgKJGoAR_GLX9OorBcSEXJU=.95d37fce-8278-4408-90fc-0cc3997068a3@github.com> Message-ID: <24UCY5LtZhmLY7P4BcWkeT9VddvnjYijQ_o1QP7Gl6w=.5cb4aa82-bc5e-47bc-bb84-0093124db6f2@github.com> On Wed, 17 Dec 2025 14:16:29 GMT, Roland Westrelin wrote: >> The failure occurs because `PhiNode::Ideal` uses `set_req` to update >> an input of a `Phi`. That causes the previous input to be disconnected >> but because of the use of `set_req`, the previous input that has no >> use is not enqueued for `igvn` to be reclaimed. The fix is to use >> `set_req_X` instead. I replaced uses of `set_req` with `set_req_X` in >> `PhiNode::Ideal` where I thought it made sense. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Update looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28841#pullrequestreview-3597482122 From roland at openjdk.org Fri Dec 19 08:34:18 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 19 Dec 2025 08:34:18 GMT Subject: RFR: 8373524: C2: no reachable node should have no use [v3] In-Reply-To: <24UCY5LtZhmLY7P4BcWkeT9VddvnjYijQ_o1QP7Gl6w=.5cb4aa82-bc5e-47bc-bb84-0093124db6f2@github.com> References: <8xp47-406XvQpBbyImEKvgKJGoAR_GLX9OorBcSEXJU=.95d37fce-8278-4408-90fc-0cc3997068a3@github.com> <24UCY5LtZhmLY7P4BcWkeT9VddvnjYijQ_o1QP7Gl6w=.5cb4aa82-bc5e-47bc-bb84-0093124db6f2@github.com> Message-ID: On Fri, 19 Dec 2025 08:16:22 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Update looks good, thanks! @chhagedorn thanks for re-approving. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28841#issuecomment-3674089602 From roland at openjdk.org Fri Dec 19 08:34:20 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 19 Dec 2025 08:34:20 GMT Subject: Integrated: 8373524: C2: no reachable node should have no use In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 10:02:46 GMT, Roland Westrelin wrote: > The failure occurs because `PhiNode::Ideal` uses `set_req` to update > an input of a `Phi`. That causes the previous input to be disconnected > but because of the use of `set_req`, the previous input that has no > use is not enqueued for `igvn` to be reclaimed. The fix is to use > `set_req_X` instead. I replaced uses of `set_req` with `set_req_X` in > `PhiNode::Ideal` where I thought it made sense. This pull request has now been integrated. Changeset: e72f205a Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/e72f205ae312b15ebab0cbeedb73bbf86e485251 Stats: 94 lines in 2 files changed: 91 ins; 0 del; 3 mod 8373524: C2: no reachable node should have no use Reviewed-by: chagedorn, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/28841 From galder at openjdk.org Fri Dec 19 08:36:11 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 19 Dec 2025 08:36:11 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v2] In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 23:17:06 GMT, Dean Long wrote: >> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - Merge branch 'master' into topic.uses-min-max >> - Test Float16 >> - Only apply to uses that match original IR node >> - Merge branch 'master' into topic.uses-min-max >> - Use is_MinMax() instead of spelling out individual Min/Max opcodes >> - Refactor MaxNode to MinMaxNode and add is_MinMax() query >> - Add max(a, max(b, c)) patterns to add users of use >> - Add templated test >> - Remove exclude or Min/Max in verify identity > > src/hotspot/share/opto/phaseX.cpp line 2609: > >> 2607: for (DUIterator_Fast i2max, i2 = use->fast_outs(i2max); i2 < i2max; i2++) { >> 2608: Node* u = use->fast_out(i2); >> 2609: if (u->Opcode() == use->Opcode()) { > > So there are no Min(Max()) or Max(Min()) patterns we need to worry about? I was expecting this line to be > > if (u->is_MinMax()) { Good question. There could be some patterns but I couldn't think of any when I was working on this, so I limited it to the patterns that I knew for sure required this, e.g. Max(Max()), Min(Min()). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28895#discussion_r2634188399 From galder at openjdk.org Fri Dec 19 08:41:50 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 19 Dec 2025 08:41:50 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v2] In-Reply-To: References: Message-ID: <0IGsttp1N8WPq3gM-AWWSo-lP1bEl-6lVJgJrEic-EY=.80741cf2-7496-4bdd-bd3f-a93d55654f29@github.com> On Thu, 18 Dec 2025 15:05:08 GMT, Galder Zamarre?o wrote: >> Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. >> >> Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. >> >> I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. >> >> To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). >> >> During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. >> >> I've run tier1-3 tests on linux/x64 successfully. > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Merge branch 'master' into topic.uses-min-max > - Test Float16 > - Only apply to uses that match original IR node > - Merge branch 'master' into topic.uses-min-max > - Use is_MinMax() instead of spelling out individual Min/Max opcodes > - Refactor MaxNode to MinMaxNode and add is_MinMax() query > - Add max(a, max(b, c)) patterns to add users of use > - Add templated test > - Remove exclude or Min/Max in verify identity There's an issue with the Float16 incubator API access that I did not notice in local testing, I'm looking into it ------------- PR Comment: https://git.openjdk.org/jdk/pull/28895#issuecomment-3674116391 From xgong at openjdk.org Fri Dec 19 08:46:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 19 Dec 2025 08:46:07 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v3] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <4VXHOCR1YSoMVbDbB8j-j18Z-_VbO0y5fJfyp3IjQ9c=.19485011-9cb3-4016-a642-61cee81adcd1@github.com> Message-ID: On Fri, 12 Dec 2025 15:42:24 GMT, Bhavana Kilambi wrote: >> I mean we do not expect there is data-dependence between two `ins` operations, but it has now. We do not recommend use the instructions that just write part of a register. This might involve un-expected dependence between. I suggest to use `ext` instead, and I can observe about 20% performance improvement compared with current version on V2. I did not check the correctness, but it looks right to me. Could you please help check on other machines? Thanks! >> >> The change might look like: >> Suggestion: >> >> fmulh(dst, fsrc, vsrc); >> ext(vtmp, T8B, vsrc, vsrc, 2); >> fmulh(dst, dst, vtmp); >> ext(vtmp, T8B, vsrc, vsrc, 4); >> fmulh(dst, dst, vtmp); >> ext(vtmp, T8B, vsrc, vsrc, 6); >> fmulh(dst, dst, vtmp); >> if (isQ) { >> ext(vtmp, T16B, vsrc, vsrc, 8); >> fmulh(dst, dst, vtmp); >> ext(vtmp, T16B, vsrc, vsrc, 10); >> fmulh(dst, dst, vtmp); >> ext(vtmp, T16B, vsrc, vsrc, 12); >> fmulh(dst, dst, vtmp); >> ext(vtmp, T16B, vsrc, vsrc, 14); >> fmulh(dst, dst, vtmp); > > Hi @XiaohongGong Thanks for this suggestion. I understand that `ins` has a read-modify-write dependency while `ext` does not have that as we are not reading the `vtmp` register in this case. > > I made changes to both the add and mul reduction implementation and I could see some perf gains on Neoverse V1 and Neoverse V2 for mul reduction but none for Neoverse N1. The following is ratio between throughput with `ext` and throughput with `ins` (`>1` would mean `ext` is better) on Neoverse V2 - > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40"> > > > > > > href="file:////Users/bhakil01/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip.htm"> > href="file:////Users/bhakil01/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_filelist.xml"> > > > > > > > Benchmark | vectorDim | 8B | 16B > -- | -- | -- | -- > Float16OperationsBenchmark.ReductionAddFP16 | 256 | 1.0022509 | 0.99938584 > Float16OperationsBenchmark.ReductionAddFP16 | 512 | 1.05157946 | 1.00262025 > Float16OperationsBenchmark.ReductionAddFP16 | 1024 | 1.02392196 | 1.00187924 > Float16OperationsBenchmark.ReductionAddFP16 | 2048 | 1.01219315 | 0.99964493 > Float16OperationsBenchmark.ReductionMulFP16 | 256 | 0.99729809 | 1.19006546 > Float16OperationsBenchmark.ReductionMulFP16 | 512 | 1.03897347 | 1.0689105 > Float16OperationsBenchmark.ReductionMulFP16 | 1024 | 1.01822982 | 1.01509971 > Float16OperationsBenchmark.ReductionMulFP16 | 2048 | 1.0086255 | 1.0032434 > > > > > > > > The 20% gain you mentioned is reproducible but only for the smallest array size. The gains taper for larger array sizes (my wild guess is that for smaller array sizes the loop is lantency-bound so reducing the dependency due to the `ins` chains helps bring down the total latency but for larger array sizes the loop becomes more memory bound with more number of loads/stores and probably here removing the `ins` dependency chains doesn't help much?). > > > Similar number for Neoverse V1 - > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40"> > > > > References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Thu, 18 Dec 2025 10:17:47 GMT, Bhavana Kilambi wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Fix build failures on Mac src/hotspot/cpu/aarch64/aarch64_vector.ad line 3457: > 3455: format %{ "reduce_addD_sve $dst_src1, $dst_src1, $src2" %} > 3456: ins_encode %{ > 3457: assert(UseSVE > 0, "must be sve"); Why do you remove this assertion? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2634230858 From xgong at openjdk.org Fri Dec 19 08:54:58 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 19 Dec 2025 08:54:58 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v2] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Wed, 17 Dec 2025 10:45:46 GMT, Bhavana Kilambi wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Address review comments > - Merge 'master' > - 8366444: Add support for add/mul reduction operations for Float16 > > This patch adds mid-end support for vectorized add/mul reduction > operations for half floats. It also includes backend aarch64 support for > these operations. Only vectorization support through autovectorization > is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate > the implementation to be strictly ordered. The following is how each of > these reductions is implemented for different aarch64 targets - > > For AddReduction : > On Neon only targets (UseSVE = 0): Generates scalarized additions > using the scalar "fadd" instruction for both 8B and 16B vector lengths. > This is because Neon does not provide a direct instruction for computing > strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the "fadda" instruction which > computes add reduction for floating point in strict order. > > For MulReduction : > Both Neon and SVE do not provide a direct instruction for computing > strictly ordered floating point multiply reduction. For vector lengths > of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is > generated and multiply reduction for vector lengths > 16B is not > supported. > > Below is the performance of the two newly added microbenchmarks in > Float16OperationsBenchmark.java tested on three different aarch64 > machines and with varying MaxVectorSize - > > Note: On all machines, the score (ops/ms) is compared with the master > branch without this patch which generates a sequence of loads ("ldrsh") > to load the FP16 value into an FPR and a scalar "fadd/fmul" to > add/multiply the loaded value to the running sum/product. The ratios > given below are the ratios between the throughput with this patch and > the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the > master branch. > > N1 (UseSVE = 0, max vector length = 16B): > Benchmark vectorDim Mode Cnt 8B 16B > ReductionAddFP16 256 thrpt 9 1.41 1.40 > Redu... src/hotspot/cpu/aarch64/aarch64_vector.ad line 3490: > 3488: %} > 3489: ins_pipe(pipe_slow); > 3490: %} Could you please float this rule above `reduce_addF_sve` and below `reduce_addHF`? Better to rename `reduce_addHF` to `reduce_addHF_neon` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2634240241 From bmaillard at openjdk.org Fri Dec 19 08:56:44 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 19 Dec 2025 08:56:44 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v6] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: On Thu, 18 Dec 2025 13:53:36 GMT, Emanuel Peter wrote: >> Beno?t Maillard has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Remove trailing whitespace >> - Merge branch 'master' into JDK-8371536 >> - More style >> - Style and comments >> - Merge branch 'master' into JDK-8371536 >> - Assert directly in the verify methods >> - Add comment for _table.hash_delete(n) >> - Change assert to print only the cause and the node name >> >> Bring back old comment >> >> Wording >> - Assert at first failure >> - Remove node from hash table before calling Ideal in verification > > src/hotspot/share/opto/phaseX.cpp line 1202: > >> 1200: tty->print_cr("%s", ss.as_string()); >> 1201: >> 1202: assert(false, "Missed Value optimization opportunity in PhaseIterGVN for %s", n->Name()); > > What if it gets called during CCP? Then it is not just a missed opportunity, but possibly a correctness problem. > > I wonder if we should have different assert messages here. We could even just pass a string into the method, either `IGVN` or `CCP`. > > What do you think? Good point, I didn't think of that. Passing a string into the method would be one solution. Another one would be to keep the `bool` return type for `verify_Value_for` and assert at the call site (just as it was before). I think this feels a bit more natural that passing an assert message as parameter. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2634249334 From bmaillard at openjdk.org Fri Dec 19 09:02:24 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 19 Dec 2025 09:02:24 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v7] In-Reply-To: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: > This PR introduces changes in the detection of missing IGVN optimizations. As explained in the JBS issue description, when `-XX:VerifyIterativeGVN` was introduced, it was helpful to list all the missing optimizations. Such failures occur less frequently now, and the focus has changed to being able to debug such failure quickly and identifying similar or related failures during bug triaging. > > In summary, this PR brings the following changes: > - Assert at the first verification failure in `verify_Optimize` instead of attemtping to process all the nodes in the graph. This makes the output easier to parse, and also decreases the overhead of getting to the actual optimization site with a debugger. > - Avoid confusing `Need to remove from hash before changing edges` assert messages by removing the verified node from the hash table before attempting to optimize the node in question. > - Provide the failure reason (Ideal, Identity or Value) and the node name in the assert message itself to facilitate identifying related failures in the testing infrastructure during bug triaging. > > ### Example outputs > #### [JDK-8371534: C2: Missed Ideal optimization opportunity with AndL and URShiftL ](https://bugs.openjdk.org/browse/JDK-8371534) > Before the change, we would get two missed optimizations (the second one is only a consequence of the first one). After the change, we only get the first one, which is the one that actually needs to be fixed. We also get the name of the node in the assert message. >
> Before > > > Missed Ideal optimization (can_reshape=false): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298 21 [[ 290 ]] !orig=[236],[193] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:42 (line 81) > 0 290 URShiftL === _ 297 22 [[ 299 ]] !orig=[231],[194] !jvms: TestMaskAndRShiftReorder::testURShiftL @ bci:46 (line 82) > The result after Ideal: > dist dump > --------------------------------------------- > 1 337 ConL === 0 [[ 338 ]] #long:-9 > 1 336 URShiftL === _ 298 22 [[ 338 ]] > 0 338 AndL === _ 336 337 [[ ]] > > > Missed Ideal optimization (can_reshape=true): > The node was replaced by Ideal. > Old node: > dist dump > --------------------------------------------- > 1 22 ConI === 0 [[ 70 81 70 290 81 76 32 37 37 43 48 48 54 59 59 65 336 ]] #int:1 > 1 297 AndL === _ 298... Beno?t Maillard has updated the pull request incrementally with one additional commit since the last revision: Update comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28295/files - new: https://git.openjdk.org/jdk/pull/28295/files/dbd208a7..d22ce771 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28295&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28295/head:pull/28295 PR: https://git.openjdk.org/jdk/pull/28295 From bmaillard at openjdk.org Fri Dec 19 09:02:28 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 19 Dec 2025 09:02:28 GMT Subject: RFR: 8371536: C2: VerifyIterativeGVN should assert on first detected failure [v6] In-Reply-To: References: <5K0igbTx6JbfD8CyDPRZJH_AbOlDxM7vQ4x3cu-jJCA=.23a59b9b-a849-4271-a659-1494168da91e@github.com> Message-ID: <_E93UgITcQ0N_xt-X7FquXPHHIeNuXINHFkIqjrVtTE=.f12a371d-554d-4f68-8cd3-515f9e8b55fc@github.com> On Thu, 18 Dec 2025 13:48:01 GMT, Emanuel Peter wrote: >> Beno?t Maillard has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Remove trailing whitespace >> - Merge branch 'master' into JDK-8371536 >> - More style >> - Style and comments >> - Merge branch 'master' into JDK-8371536 >> - Assert directly in the verify methods >> - Add comment for _table.hash_delete(n) >> - Change assert to print only the cause and the node name >> >> Bring back old comment >> >> Wording >> - Assert at first failure >> - Remove node from hash table before calling Ideal in verification > > src/hotspot/share/opto/phaseX.cpp line 1090: > >> 1088: // We should either make sure that this node is properly added back to the IGVN worklist >> 1089: // in PhaseIterGVN::add_users_to_worklist to update it again or add an exception >> 1090: // in the verification code above if that is not possible for some reason (like Load nodes). > > You moved the comment. It used to make sense to refer to "exceptions above". Now the exceptions are inside the methods, relative to the new place of the comment ;) Good catch :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28295#discussion_r2634261375 From xgong at openjdk.org Fri Dec 19 09:11:56 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 19 Dec 2025 09:11:56 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v3] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: <58NIPjOC6PTzn0H5BwY5FUkNfpe_qHuHLyIPCLiZ1QI=.0bf7f024-e772-459c-bd96-01981446beda@github.com> On Thu, 18 Dec 2025 10:17:47 GMT, Bhavana Kilambi wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Fix build failures on Mac src/hotspot/cpu/aarch64/aarch64_vector.ad line 259: > 257: // implements strictly ordered floating point add reduction which does not require > 258: // the FEAT_FP16 and ASIMDHP checks as SVE supports half-precision floats by default. > 259: case Op_AddReductionVHF: Does it need to check `length_in_bytes < 8` for add reduction? src/hotspot/cpu/aarch64/aarch64_vector.ad line 392: > 390: case Op_StoreVectorScatter: > 391: case Op_AddReductionVF: > 392: case Op_AddReductionVHF: Suggestion: case Op_AddReductionVHF: case Op_AddReductionVF: src/hotspot/share/opto/vectornode.hpp line 323: > 321: // is generated through VectorAPI as VectorAPI does not impose any such rules on ordering. > 322: const bool _requires_strict_order; > 323: public: Suggestion: const bool _requires_strict_order; public: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2634252247 PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2634258731 PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2634278136 From galder at openjdk.org Fri Dec 19 09:19:48 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 19 Dec 2025 09:19:48 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v3] In-Reply-To: References: Message-ID: > Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. > > Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. > > I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. > > To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). > > During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. > > I've run tier1-3 tests on linux/x64 successfully. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Add missing module to test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28895/files - new: https://git.openjdk.org/jdk/pull/28895/files/4221b67d..1cc8a021 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28895.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28895/head:pull/28895 PR: https://git.openjdk.org/jdk/pull/28895 From mli at openjdk.org Fri Dec 19 09:23:24 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 19 Dec 2025 09:23:24 GMT Subject: RFR: 8373998: RISC-V: simple optimization of ConvHF2F In-Reply-To: References: Message-ID: <-lx6YPORxGcqhm_F4iDE8qVds7HB4B7f5wi-bt3Y4dc=.eeb27605-1fa1-487c-a66a-d71f43109904@github.com> On Fri, 19 Dec 2025 00:50:10 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this patch? >> >> ConvHF2F could be optimized by following patch. >> As riscv does not have the restriction to use `iRegINoSp` in src register, it can use iRegIorL2I instead. >> Check other usages of src register in riscv.ad file. >> >> @RealFYang Thanks! >> >> Thanks! > > Thanks. My local hs:tier1 - hs:tier2 test with `Zfh` is good using fastdebug build. Thank you for testing and review @RealFYang @DingliZhang ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28896#issuecomment-3674244316 From mli at openjdk.org Fri Dec 19 09:23:25 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 19 Dec 2025 09:23:25 GMT Subject: Integrated: 8373998: RISC-V: simple optimization of ConvHF2F In-Reply-To: References: Message-ID: On Thu, 18 Dec 2025 11:41:13 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > ConvHF2F could be optimized by following patch. > As riscv does not have the restriction to use `iRegINoSp` in src register, it can use iRegIorL2I instead. > Check other usages of src register in riscv.ad file. > > @RealFYang Thanks! > > Thanks! This pull request has now been integrated. Changeset: 5eb87749 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/5eb8774909bd250c7ff8cfc56506a949b547bda2 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8373998: RISC-V: simple optimization of ConvHF2F Co-authored-by: Fei Yang Reviewed-by: fyang, dzhang ------------- PR: https://git.openjdk.org/jdk/pull/28896 From galder at openjdk.org Fri Dec 19 09:27:23 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 19 Dec 2025 09:27:23 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v3] In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 09:19:48 GMT, Galder Zamarre?o wrote: >> Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. >> >> Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. >> >> I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. >> >> To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). >> >> During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. >> >> I've run tier1-3 tests on linux/x64 successfully. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Add missing module to test The module issue should be fixed now ------------- PR Comment: https://git.openjdk.org/jdk/pull/28895#issuecomment-3674262474 From bmaillard at openjdk.org Fri Dec 19 10:24:30 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Fri, 19 Dec 2025 10:24:30 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v6] In-Reply-To: References: Message-ID: On Thu, 11 Dec 2025 15:42:42 GMT, Roland Westrelin wrote: >> For this failure memory stats are: >> >> >> Total Usage: 1095525816 >> --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- >> Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other >> none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 >> parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 >> optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 >> connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 >> iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 >> idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 >> macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 >> matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 >> postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 >> scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 >> regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 >> ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > package declaration I think we should use the following test, which is quite concise and only takes a few seconds to execute thanks to setting `memlimit` to `100M`. ```c++ /** * @test * @key stress randomness * @bug 8370519 * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations * @run main/othervm -XX:CompileCommand=compileonly,${test.main.class}::* -XX:-TieredCompilation -Xbatch * -XX:+UnlockDiagnosticVMOptions -XX:+IgnoreUnrecognizedVMOptions * -XX:+StressLoopPeeling -XX:+VerifyLoopOptimizations * -XX:CompileCommand=memlimit,${test.main.class}::*,100M~crash * -XX:StressSeed=3106998670 ${test.main.class} * @run main ${test.main.class} */ package compiler.c2; public class TestVerifyLoopOptimizationsHighMemUsage { static int b = 400; static long c; static boolean d; static long lMeth(int e) { int f, g, h, k[] = new int[b]; long l[] = new long[b]; boolean m[] = new boolean[b]; for (f = 5; f < 330; ++f) for (g = 1; g < 5; ++g) for (h = 2; h > 1; h -= 3) switch (f * 5 + 54) { case 156: case 354: case 98: case 173: case 120: case 374: case 140: case 57: case 106: case 306: case 87: case 399: k[1] = (int)c; case 51: case 287: case 148: case 70: case 74: case 59: m[h] = d; } long n = p(l); return n; } public static long p(long[] a) { long o = 0; for (int j = 0; j < a.length; j++) o += j; return o; } public static void main(String[] args) { for (int i = 0; i < 10; i++) lMeth(9); } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/28581#issuecomment-3674476543 From dlong at openjdk.org Fri Dec 19 10:36:58 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 19 Dec 2025 10:36:58 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v2] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 14:04:04 GMT, Roland Westrelin wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - review > - review > - merge > - more > - more > - more > - undo > - exps I ran testing and got one mysterious timeout in java/net/httpclient/DurationOverflowTest.java#withPropertyConfig Normally this test runs in seconds, but it timed out in 9 minutes. It could be unrelated to your changes or even a test bug, but I couldn't find any previous timeout failures for this test. Most of your changes look low-risk. The change I'm most concerned about is in LoadKlassNode::Identity, when it returns the Address edge instead of Base. ------------- PR Review: https://git.openjdk.org/jdk/pull/28769#pullrequestreview-3598064741 From dlong at openjdk.org Fri Dec 19 10:51:55 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 19 Dec 2025 10:51:55 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition In-Reply-To: References: Message-ID: <-nLm4HuJq97zjRaGudlHqDebMHZpCa7akwiG-li0GpU=.9ec4ec26-b3dc-4c28-b859-57b8ea96b40b@github.com> On Fri, 19 Dec 2025 04:00:31 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the incorrect early return condition in `C2_MacroAssembler::verify_int_in_range`. Previously, `lo == min_jint && hi == max_jint` is equivalent to `t == Type::INT`. But this is not true anymore. > > Please take a look and leave your reviews, thanks a lot. Is it possible to write a regression test for this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28916#issuecomment-3674577877 From dlong at openjdk.org Fri Dec 19 10:53:53 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 19 Dec 2025 10:53:53 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v2] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 14:04:04 GMT, Roland Westrelin wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - review > - review > - merge > - more > - more > - more > - undo > - exps I'll try re-running testing to see if it fails again. ------------- Changes requested by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28769#pullrequestreview-3598128322 From qamai at openjdk.org Fri Dec 19 11:19:07 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 19 Dec 2025 11:19:07 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition In-Reply-To: <-nLm4HuJq97zjRaGudlHqDebMHZpCa7akwiG-li0GpU=.9ec4ec26-b3dc-4c28-b859-57b8ea96b40b@github.com> References: <-nLm4HuJq97zjRaGudlHqDebMHZpCa7akwiG-li0GpU=.9ec4ec26-b3dc-4c28-b859-57b8ea96b40b@github.com> Message-ID: On Fri, 19 Dec 2025 10:48:12 GMT, Dean Long wrote: >> Hi, >> >> This PR fixes the incorrect early return condition in `C2_MacroAssembler::verify_int_in_range`. Previously, `lo == min_jint && hi == max_jint` is equivalent to `t == Type::INT`. But this is not true anymore. >> >> Please take a look and leave your reviews, thanks a lot. > > Is it possible to write a regression test for this? @dean-long Thanks for your comment. I have not been able to come up with a regression now. The reason is that unsigned bounds and known bits have only been implemented for `And/Or/Xor`. But since `min_jint == 0x10...00` and `max_jint == 0x01...11`, any known bit will imply `_lo > min_jint || _hi < max_jint`. So there is no case when `t != TypeInt::INT` but `t->_lo == min_jint || t->_hi == max_jint`. I encountered this issue when trying to implement inference for `Add/Sub`, which may make the above situation arise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28916#issuecomment-3674670316 From galder at openjdk.org Fri Dec 19 13:05:27 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 19 Dec 2025 13:05:27 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v4] In-Reply-To: References: Message-ID: > Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. > > Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. > > I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. > > To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). > > During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. > > I've run tier1-3 tests on linux/x64 successfully. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: It's the templated test that needs the module ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28895/files - new: https://git.openjdk.org/jdk/pull/28895/files/1cc8a021..09fba3fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=02-03 Stats: 6 lines in 1 file changed: 3 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28895.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28895/head:pull/28895 PR: https://git.openjdk.org/jdk/pull/28895 From galder at openjdk.org Fri Dec 19 13:07:24 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 19 Dec 2025 13:07:24 GMT Subject: RFR: 8373134: C2: Min/Max users of Min/Max uses should be enqueued for GVN [v5] In-Reply-To: References: Message-ID: > Min/Max users of Min/Max uses need to be enqueued respectively to the GVN worklist to see if further optimizations can be applied. Without this, there are cases where additional potential ideal/identity optimizations are not applied. I need this fix to test min/max reassociation implementation with IR tests reliably. > > Aside from the fix itself, I've refactored `MaxNode` to `MinMaxNode` and added a `is_MinMax` node query to simplify the fix. > > I have also removed the Min/Max exceptions in `PhaseIterGVN::verify_Identity_for` since this fix fixes `compiler/codegen/TestBooleanVect.java` with `-XX:VerifyIterativeGVN=1110`. > > To test this I've created a template framework test that validates the fix. I have tested with all Min/Max combinations including Float16, which I've verified with Intel SDE. Float16 does not use `Argument.NUMBER_42` because there's no support for it yet, see [JDK-8373977](https://bugs.openjdk.org/browse/JDK-8373977). > > During development I noticed that the test only failed when the test had `b, a` parameters in that order, so I added tests for both cases as `a, b` and `b, a` so that all possible orders are covered and they don't slip in the future. > > I've run tier1-3 tests on linux/x64 successfully. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Module also needed in the wrapper test class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28895/files - new: https://git.openjdk.org/jdk/pull/28895/files/09fba3fb..e2579bb1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28895&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28895.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28895/head:pull/28895 PR: https://git.openjdk.org/jdk/pull/28895 From roland at openjdk.org Fri Dec 19 13:15:38 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 19 Dec 2025 13:15:38 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v3] In-Reply-To: References: Message-ID: > The base input of `AddP` is expected to only be set for heap accesses > but I noticed some inconsistencies so I added an assert in the `AddP` > constructor and fixed issues that it caught. AFAFICT, the > inconsistencies shouldn't create issues. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28769/files - new: https://git.openjdk.org/jdk/pull/28769/files/38eb3b3f..007e73cd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28769&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28769&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28769/head:pull/28769 PR: https://git.openjdk.org/jdk/pull/28769 From roland at openjdk.org Fri Dec 19 13:19:08 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 19 Dec 2025 13:19:08 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v2] In-Reply-To: <_x8OV9zV3TaggMMwjNpjaeBtQRIxXbmq-VvV10SmTHg=.b7a98f3f-19c6-442b-b71b-a86a5b6f685a@github.com> References: <_x8OV9zV3TaggMMwjNpjaeBtQRIxXbmq-VvV10SmTHg=.b7a98f3f-19c6-442b-b71b-a86a5b6f685a@github.com> Message-ID: On Thu, 18 Dec 2025 03:27:29 GMT, Dean Long wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - review >> - review >> - merge >> - more >> - more >> - more >> - undo >> - exps > > src/hotspot/share/opto/memnode.cpp line 2570: > >> 2568: assert(tkls2->offset() == 0, "not a load of java_mirror"); >> 2569: #endif >> 2570: return adr2->in(AddPNode::Address); > > What should the value of adr2->in(AddPNode::Offset) be at this point? 0 or java_mirror_offset()? Do we need to check it? I added a couple asserts here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2635070902 From roland at openjdk.org Fri Dec 19 13:23:59 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 19 Dec 2025 13:23:59 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph In-Reply-To: References: Message-ID: <4eUiFfzqfjvPqmmbFqVtX3LCTKhePamkSWzj9WaCjKM=.d0de25a9-50b6-4c30-bceb-5f6152938903@github.com> On Wed, 17 Dec 2025 21:19:07 GMT, Dean Long wrote: > > Whether that code is removed or not, it makes little sense to sink the CreateEx anyway. > > That's the part I'm still trying to understand. If we fix CreateExNode::Identity now and allow it to move outside the loop, the crash goes away. My understanding is that the CreateEx is for the exception handler. If the exception handler had a safepoint, then moving it out of the loop seems useful. What do you think? // Create exception oop: created by stack-crawling runtime code. // Created exception is now available to this handler, and is setup // just prior to jumping to this handler. No code emitted. instruct CreateException(rax_RegP ex_oop) %{ match(Set ex_oop (CreateEx)); size(0); `CreateEx` doesn't do anything. So it seems to me the risk of breaking something by accident in some uncommon case is not worth the risk and I would go with a conservative fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28842#issuecomment-3675069103 From roland at openjdk.org Fri Dec 19 13:27:08 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 19 Dec 2025 13:27:08 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph [v2] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 14:25:56 GMT, Roland Westrelin wrote: >> test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java line 30: >> >>> 28: * @library /test/lib >>> 29: * @run main/othervm -Xbatch ${test.main.class} >>> 30: * @run main ${test.main.class} >> >> Since this test runs for 4s at least, I'm not sure if it's worth to have an Xbatch and non-Xbatch version. Does it trigger with both? > > It does. Which would you keep then? I would keep the one with Xbatch because it's more likely to reproduce. Does that sound good to you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28842#discussion_r2635092319 From dfenacci at openjdk.org Fri Dec 19 14:08:27 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 19 Dec 2025 14:08:27 GMT Subject: RFR: 8373525: C2: assert(_base == Long) failed: Not a Long Message-ID: # Issue Olivier's fuzzer found a test that makes C2 crash while running the optimization that collapses the addition with overflow-protection (`fold_subI_no_underflow_pattern`). # Causes The crash happens because during `fold_subI_no_underflow_pattern` the first input of the `AddL` node (see comment below) becomes top. https://github.com/openjdk/jdk/blob/82b04f01bc99e8155518b8b8600d180981a42fc5/src/hotspot/share/opto/addnode.cpp#L1525-L1533 This happens because of a whole `IfFalse` subgraph that dies and nodes are being removed. `AddL` is not removed immediately as it has another input which is still alive but it is put in the IGVN worklist instead. image Unfortunately the `fold_subI_no_underflow_pattern` optimization runs before the next GVN pass and triggers the assert. # Fix `fold_subI_no_underflow_pattern` should actually take into account that we could have the graph in such a state and that `x` could be top. So, the sensible fix is not to presume `x` to be of type long and bailout if it is not. # Testing Tier 1-3+ (also checked for new regression test failure before the change) ------------- Commit messages: - JDK-8373525: add test requires - JDK-8373525: C2: assert(_base == Long) failed: Not a Long Changes: https://git.openjdk.org/jdk/pull/28920/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28920&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373525 Stats: 117 lines in 2 files changed: 116 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28920/head:pull/28920 PR: https://git.openjdk.org/jdk/pull/28920 From roland at openjdk.org Fri Dec 19 14:10:11 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 19 Dec 2025 14:10:11 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v6] In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 10:21:29 GMT, Beno?t Maillard wrote: > I think we should use the following test, which is quite concise and only takes a few seconds to execute thanks to setting `memlimit` to `100M`. Thanks for working on that. Without the patch and no limit on memory usage, I see: Arena Usage by Arena Type and compilation phase, at arena usage peak of 106292736 idealLoop 11723824 0 2814728 8516320 392776 0 0 0 0 0 0 0 0 output 84858328 84858328 0 0 0 0 0 0 0 0 0 0 0 With the fix: Arena Usage by Arena Type and compilation phase, at arena usage peak of 98628104 idealLoop 4288448 0 2814728 1080944 0 392776 0 0 0 0 0 0 0 0 output 84858328 84858328 0 0 0 0 0 0 0 0 0 0 0 0 So 7+ MB of memory is saved but most of the memory is used by `output` anyway. As a consequence, with the fix, memory usage is still close to 100MB. I wonder how robust the test is going to be on other platforms (`output` memory usage could be higher) or as the C2 code evolves and memory usage changes. Have you run it on other platforms with the fix to make sure it does pass everywhere? If , say `idealLoop` usage was about as high as `output` without the fix, that would make the test more robust. In the process of coming up with that test, was there any other tests you try that took a bit longer to run but used a bit more memory? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28581#issuecomment-3675211322 From mhaessig at openjdk.org Fri Dec 19 14:45:22 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 19 Dec 2025 14:45:22 GMT Subject: RFR: 8373525: C2: assert(_base == Long) failed: Not a Long In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 10:22:58 GMT, Damon Fenacci wrote: > # Issue > Olivier's fuzzer found a test that makes C2 crash while running the optimization that collapses the addition with overflow-protection (`fold_subI_no_underflow_pattern`). > > # Causes > The crash happens because during `fold_subI_no_underflow_pattern` the first input of the `AddL` node (see comment below) becomes top. > https://github.com/openjdk/jdk/blob/82b04f01bc99e8155518b8b8600d180981a42fc5/src/hotspot/share/opto/addnode.cpp#L1525-L1533 > > This happens because of a whole `IfFalse` subgraph that dies and nodes are being removed. `AddL` is not removed immediately as it has another input which is still alive but it is put in the IGVN worklist instead. > > image > > Unfortunately the `fold_subI_no_underflow_pattern` optimization runs before the next GVN pass and triggers the assert. > > # Fix > `fold_subI_no_underflow_pattern` should actually take into account that we could have the graph in such a state and that `x` could be top. So, the sensible fix is not to presume `x` to be of type long and bailout if it is not. > > # Testing > Tier 1-3+ > (also checked for new regression test failure before the change) Thank you for fixing this, @dafedafe. The fix looks good. I only have some minor comments on the test. test/hotspot/jtreg/compiler/loopopts/TestValidTypeInOverflowProtection.java line 28: > 26: * @bug 8373525 > 27: * @summary Test for the check of a valid type (long) for the input variable of overflow protection > 28: * @requires vm.debug == true & vm.compiler2.enabled Is this requires needed? I guess it is needed to reproduce the failure, but does this test fail on a product build? Running it in more scenarios should provide for some more coverage. test/hotspot/jtreg/compiler/loopopts/TestValidTypeInOverflowProtection.java line 31: > 29: * @run main/othervm -Xbatch -XX:-TieredCompilation -XX:CompileCommand=compileonly,compiler.loopopts.TestValidTypeInOverflowProtection::test > 30: * compiler.loopopts.TestValidTypeInOverflowProtection > 31: * @run driver compiler.loopopts.TestValidTypeInOverflowProtection Now that we have it, we should use the fancy jtreg variable for the test class to fend off the typos. Suggestion: * ${test.main.class} * @run driver ${test.main.class} ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28920#pullrequestreview-3598854842 PR Review Comment: https://git.openjdk.org/jdk/pull/28920#discussion_r2635310142 PR Review Comment: https://git.openjdk.org/jdk/pull/28920#discussion_r2635303964 From epeter at openjdk.org Fri Dec 19 15:11:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 Dec 2025 15:11:37 GMT Subject: [jdk26] RFR: 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph Message-ID: <_qsak5U1E4TBrhC0V15Lm5TmaDjIXFNwA93zyXxkXNI=.6234364a-a7fe-46af-a3b7-798c4ab45496@github.com> Clean backport of https://github.com/openjdk/jdk/pull/28783 to JDK26. ------------- Commit messages: - Backport 00050f84d44f3ec23e9c6da52bffd68770010749 Changes: https://git.openjdk.org/jdk/pull/28929/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28929&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373502 Stats: 112 lines in 3 files changed: 112 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28929.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28929/head:pull/28929 PR: https://git.openjdk.org/jdk/pull/28929 From dfenacci at openjdk.org Fri Dec 19 15:11:49 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 19 Dec 2025 15:11:49 GMT Subject: RFR: 8373525: C2: assert(_base == Long) failed: Not a Long [v2] In-Reply-To: References: Message-ID: > # Issue > Olivier's fuzzer found a test that makes C2 crash while running the optimization that collapses the addition with overflow-protection (`fold_subI_no_underflow_pattern`). > > # Causes > The crash happens because during `fold_subI_no_underflow_pattern` the first input of the `AddL` node (see comment below) becomes top. > https://github.com/openjdk/jdk/blob/82b04f01bc99e8155518b8b8600d180981a42fc5/src/hotspot/share/opto/addnode.cpp#L1525-L1533 > > This happens because of a whole `IfFalse` subgraph that dies and nodes are being removed. `AddL` is not removed immediately as it has another input which is still alive but it is put in the IGVN worklist instead. > > image > > Unfortunately the `fold_subI_no_underflow_pattern` optimization runs before the next GVN pass and triggers the assert. > > # Fix > `fold_subI_no_underflow_pattern` should actually take into account that we could have the graph in such a state and that `x` could be top. So, the sensible fix is not to presume `x` to be of type long and bailout if it is not. > > # Testing > Tier 1-3+ > (also checked for new regression test failure before the change) Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/loopopts/TestValidTypeInOverflowProtection.java Co-authored-by: Manuel H?ssig - JDK-8373525: remove test requires ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28920/files - new: https://git.openjdk.org/jdk/pull/28920/files/20fe00fd..3219ce4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28920&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28920&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28920/head:pull/28920 PR: https://git.openjdk.org/jdk/pull/28920 From dfenacci at openjdk.org Fri Dec 19 15:11:54 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 19 Dec 2025 15:11:54 GMT Subject: RFR: 8373525: C2: assert(_base == Long) failed: Not a Long [v2] In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 14:41:03 GMT, Manuel H?ssig wrote: >> Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/loopopts/TestValidTypeInOverflowProtection.java >> >> Co-authored-by: Manuel H?ssig >> - JDK-8373525: remove test requires > > test/hotspot/jtreg/compiler/loopopts/TestValidTypeInOverflowProtection.java line 28: > >> 26: * @bug 8373525 >> 27: * @summary Test for the check of a valid type (long) for the input variable of overflow protection >> 28: * @requires vm.debug == true & vm.compiler2.enabled > > Is this requires needed? I guess it is needed to reproduce the failure, but does this test fail on a product build? Running it in more scenarios should provide for some more coverage. Thanks for the comments @mhaessig! Right! Removed! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28920#discussion_r2635392737 From mhaessig at openjdk.org Fri Dec 19 15:47:02 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 19 Dec 2025 15:47:02 GMT Subject: [jdk26] RFR: 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph In-Reply-To: <_qsak5U1E4TBrhC0V15Lm5TmaDjIXFNwA93zyXxkXNI=.6234364a-a7fe-46af-a3b7-798c4ab45496@github.com> References: <_qsak5U1E4TBrhC0V15Lm5TmaDjIXFNwA93zyXxkXNI=.6234364a-a7fe-46af-a3b7-798c4ab45496@github.com> Message-ID: <9hEg6tAg8UIDRW0dJSxWswA46Z31WM6HhDIijXQFmmo=.6fa7b6be-801a-4db8-aff8-7ff62f96d400@github.com> On Fri, 19 Dec 2025 14:55:33 GMT, Emanuel Peter wrote: > Clean backport of https://github.com/openjdk/jdk/pull/28783 to JDK26. Looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28929#pullrequestreview-3599077021 From mhaessig at openjdk.org Fri Dec 19 15:47:03 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 19 Dec 2025 15:47:03 GMT Subject: RFR: 8373525: C2: assert(_base == Long) failed: Not a Long [v2] In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 15:11:49 GMT, Damon Fenacci wrote: >> # Issue >> Olivier's fuzzer found a test that makes C2 crash while running the optimization that collapses the addition with overflow-protection (`fold_subI_no_underflow_pattern`). >> >> # Causes >> The crash happens because during `fold_subI_no_underflow_pattern` the first input of the `AddL` node (see comment below) becomes top. >> https://github.com/openjdk/jdk/blob/82b04f01bc99e8155518b8b8600d180981a42fc5/src/hotspot/share/opto/addnode.cpp#L1525-L1533 >> >> This happens because of a whole `IfFalse` subgraph that dies and nodes are being removed. `AddL` is not removed immediately as it has another input which is still alive but it is put in the IGVN worklist instead. >> >> image >> >> Unfortunately the `fold_subI_no_underflow_pattern` optimization runs before the next GVN pass and triggers the assert. >> >> # Fix >> `fold_subI_no_underflow_pattern` should actually take into account that we could have the graph in such a state and that `x` could be top. So, the sensible fix is not to presume `x` to be of type long and bailout if it is not. >> >> # Testing >> Tier 1-3+ >> (also checked for new regression test failure before the change) > > Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/loopopts/TestValidTypeInOverflowProtection.java > > Co-authored-by: Manuel H?ssig > - JDK-8373525: remove test requires Thanks for addressing my comments. Looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28920#pullrequestreview-3599078161 From chagedorn at openjdk.org Fri Dec 19 16:21:10 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 19 Dec 2025 16:21:10 GMT Subject: RFR: 8373525: C2: assert(_base == Long) failed: Not a Long [v2] In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 15:11:49 GMT, Damon Fenacci wrote: >> # Issue >> Olivier's fuzzer found a test that makes C2 crash while running the optimization that collapses the addition with overflow-protection (`fold_subI_no_underflow_pattern`). >> >> # Causes >> The crash happens because during `fold_subI_no_underflow_pattern` the first input of the `AddL` node (see comment below) becomes top. >> https://github.com/openjdk/jdk/blob/82b04f01bc99e8155518b8b8600d180981a42fc5/src/hotspot/share/opto/addnode.cpp#L1525-L1533 >> >> This happens because of a whole `IfFalse` subgraph that dies and nodes are being removed. `AddL` is not removed immediately as it has another input which is still alive but it is put in the IGVN worklist instead. >> >> image >> >> Unfortunately the `fold_subI_no_underflow_pattern` optimization runs before the next GVN pass and triggers the assert. >> >> # Fix >> `fold_subI_no_underflow_pattern` should actually take into account that we could have the graph in such a state and that `x` could be top. So, the sensible fix is not to presume `x` to be of type long and bailout if it is not. >> >> # Testing >> Tier 1-3+ >> (also checked for new regression test failure before the change) > > Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/loopopts/TestValidTypeInOverflowProtection.java > > Co-authored-by: Manuel H?ssig > - JDK-8373525: remove test requires Otherwise, looks good to me, too, thanks! src/hotspot/share/opto/addnode.cpp line 1603: > 1601: const TypeLong* x_long = phase->type(x)->isa_long(); > 1602: // Collapsed graph not equivalent if potential over/underflow -> bailing out (*) > 1603: if (x_long == nullptr || can_overflow(x_long, con1->get_long() + con2->get_long())) { I suggest to to add a comment about when `x_long` is not a long as described in the PR description. test/hotspot/jtreg/compiler/loopopts/TestValidTypeInOverflowProtection.java line 30: > 28: * @run main/othervm -Xbatch -XX:-TieredCompilation -XX:CompileCommand=compileonly,compiler.loopopts.TestValidTypeInOverflowProtection::test > 29: * ${test.main.class} > 30: * @run driver ${test.main.class} Should be `main` to allow to pass flags in. Suggestion: * @run main ${test.main.class} ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28920#pullrequestreview-3599181977 PR Review Comment: https://git.openjdk.org/jdk/pull/28920#discussion_r2635589252 PR Review Comment: https://git.openjdk.org/jdk/pull/28920#discussion_r2635590558 From qamai at openjdk.org Fri Dec 19 16:56:01 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 19 Dec 2025 16:56:01 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v10] In-Reply-To: References: Message-ID: <71h5168GmX3c9kMMlizU_ueAqytUYmU1zwfmMsRCLEY=.96fa6ece-5804-40f2-83ef-5979650a449a@github.com> On Tue, 16 Dec 2025 04:47:42 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> ## The current PR: >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. >> >> I do not see a noticeable difference in C2 runtime with and without this patch. >> >> ## Future work: >> >> 1. Nested object: >> >> Consider this case: >> >> Holder h = new Holder(); >> Object o = new Object(); >> h.o = o; >> >> Currently, `o` will be considered escaped at `h.o = o`. However, it can be seen that `o` has not actually escaped because `h` has not escaped. Luckily, with the current approach, this can be easily achieved, notice how this loop is just "if anything escapes, consider `base` escapes", currently, the "anything" here includes `base` and its aliases. if we include the base of the object at which `o` is stored, then we can correctly determine if `o` has escaped. >> >> // Find all nodes that may escape alloc, and decide that it is provable that they must be >> // executed after ctl >> EscapeStatus res = NOT_ESCAPED; >> aliases.push(base); >> for (uint idx = 0; idx < aliases.size(); idx++) { >> Node* n = aliases.at(idx); >> >> 2. Fold a memory `Phi`. >> >> This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. >> >> 3. Fold a pointer `Phi`. >> >> This can be easy, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: >> >> Point p1 = new Point; >> Point p2 = new Point; >> p1.x = v1; >> p2.x = v2; >> Point p = Phi(p1, p2); >> int a = p.x; >> >> Then, `a` sh... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Much more comments, refactor the data into a separate class I have added a section describing some future work based on this PR that I have come up with. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3675800756 From kvn at openjdk.org Fri Dec 19 17:44:04 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 19 Dec 2025 17:44:04 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 04:00:31 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the incorrect early return condition in `C2_MacroAssembler::verify_int_in_range`. Previously, `lo == min_jint && hi == max_jint` is equivalent to `t == Type::INT`. But this is not true anymore. > > Please take a look and leave your reviews, thanks a lot. Can this be new Type's check functions with comments explaining why `t == Type::INT` is not true any more? ------------- PR Review: https://git.openjdk.org/jdk/pull/28916#pullrequestreview-3599488657 From qamai at openjdk.org Fri Dec 19 18:07:53 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 19 Dec 2025 18:07:53 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 17:41:29 GMT, Vladimir Kozlov wrote: >> Hi, >> >> This PR fixes the incorrect early return condition in `C2_MacroAssembler::verify_int_in_range`. Previously, `lo == min_jint && hi == max_jint` is equivalent to `t == Type::INT`. But this is not true anymore. >> >> Please take a look and leave your reviews, thanks a lot. > > Can this be new Type's check functions with comments explaining why `t == Type::INT` is not true any more? @vnkozlov I don't think it is a good idea. The check is like that specifically because it only cares about `_lo` and `_hi` at the moment. So it seems not to be a property of the `Type`. Furthermore, these tests should take into consideration and emit code to verify other properties of the `Type` in the future anyway. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28916#issuecomment-3676018973 From kvn at openjdk.org Fri Dec 19 18:15:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 19 Dec 2025 18:15:51 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 18:04:08 GMT, Quan Anh Mai wrote: >> Can this be new Type's check functions with comments explaining why `t == Type::INT` is not true any more? > > @vnkozlov I don't think it is a good idea. The check is like that specifically because it only cares about `_lo` and `_hi` at the moment. So it seems not to be a property of the `Type`. Furthermore, these tests should take into consideration and emit code to verify other properties of the `Type` in the future anyway. @merykitty Then you need to add comment to each place you modified. Otherwise someone later could try to revert it back without understanding it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28916#issuecomment-3676043298 From qamai at openjdk.org Fri Dec 19 18:32:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 19 Dec 2025 18:32:14 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 18:12:59 GMT, Vladimir Kozlov wrote: >> @vnkozlov I don't think it is a good idea. The check is like that specifically because it only cares about `_lo` and `_hi` at the moment. So it seems not to be a property of the `Type`. Furthermore, these tests should take into consideration and emit code to verify other properties of the `Type` in the future anyway. > > @merykitty Then you need to add comment to each place you modified. Otherwise someone later could try to revert it back without understanding it. @vnkozlov If you insist, then I'll do it, but reverting it back makes no sense given a `TypeInt` has 6 properties and this test only checks for 2 of them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28916#issuecomment-3676089805 From swesonga at openjdk.org Fri Dec 19 19:31:34 2025 From: swesonga at openjdk.org (Saint Wesonga) Date: Fri, 19 Dec 2025 19:31:34 GMT Subject: [jdk26] RFR: 8373630: r18_tls should not be modified on Windows AArch64 Message-ID: <9ni--xTIR5XFH45ukPLDEiCn_XT_XZ02EmMUddoxAMI=.5a115f6c-93e8-4276-a099-f2de589fe8d5@github.com> This pull request is a backport of commit [a0dd66f9](https://github.com/openjdk/jdk/commit/a0dd66f92d7f8400b9800847e36d036315628afb) ------------- Commit messages: - Backport a0dd66f92d7f8400b9800847e36d036315628afb Changes: https://git.openjdk.org/jdk/pull/28933/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28933&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373630 Stats: 23 lines in 1 file changed: 23 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28933.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28933/head:pull/28933 PR: https://git.openjdk.org/jdk/pull/28933 From dlong at openjdk.org Fri Dec 19 22:51:00 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 19 Dec 2025 22:51:00 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition In-Reply-To: References: Message-ID: <7rdk9HOrSqPUhdeVKWdWclTuVd0IbqaGwhA1nYMxcGo=.f25abade-4c2e-4135-b7e7-1d0dbcc0e7d0@github.com> On Fri, 19 Dec 2025 04:00:31 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the incorrect early return condition in `C2_MacroAssembler::verify_int_in_range`. Previously, `lo == min_jint && hi == max_jint` is equivalent to `t == Type::INT`. But this is not true anymore. > > Please take a look and leave your reviews, thanks a lot. What about leaving the early return for TypeInt::INT only (no work to be done), but add known bits checks. Then there will still be something to check when there are no range checks to do (lo == min_jlong && hi == max_jlong)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28916#issuecomment-3676923140 From psandoz at openjdk.org Fri Dec 19 22:52:56 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 19 Dec 2025 22:52:56 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v9] In-Reply-To: References: <8hStIcvp252Ik7raxZL5BvFKKkXTflorjyOD9Cyakvc=.c5d1b302-5c49-46b1-91ba-2feda2e6a746@github.com> Message-ID: <8Z84JAkAC6yVFA_1j82FXuoqn1Gu5qQLBlgbcVDAuLQ=.ec5f98c0-42de-4395-a46e-bb2b0be3c12a@github.com> On Fri, 19 Dec 2025 06:26:47 GMT, Jatin Bhateja wrote: >>> > The basic type codes are declared and shared across Java and HotSpot - it's used in `LaneType`. Can we pass a single argument that is the basic type instead of two arguments. HotSpot should know from the basic type what the carrier class and also what the operation type without it being explicitly told, since presumably it knew the inverse - the basic type from the element class. >>> >>> Hi @PaulSandoz, T_HALFFLOAT used in LaneType is mainly used for differentiation of various cache keys used by conversion operation lookups. In principle, we can extend VM to acknowledge this new custom basic type on the lines of T_METADATA / T_ADDRESS; its scope for now will be restricted to VectorSupport. We can gradually expose this to C2 type, such that TypeVect for all Float16 VectorIR uses T_HALFFLOAT as its basic type; currently, we use T_SHORT as the lane type. Let me know if this looks reasonable >> >> I am proposing something simpler, really as a temporary step until `Float16` becomes part of the `java.base` module. IIUC from the basic type we can reliably determine what the two arguments we currently passing are e.g., T_HALFFLOAT = { short.class, VECTOR_TYPE_FP16 }. So we don't need to pass two arguments, we can just pass one, the intrinsic can lookup the class and operation type kind. > > Hi @PaulSandoz , your comments have been addressed. Please let us know if there are other comments. > Hi @eme64 , Kindly share your comments. > @jatin-bhateja Thanks for the ping! I'll put this on the list for review early in 2026 :) Same here! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28002#issuecomment-3676923873 From dlong at openjdk.org Fri Dec 19 23:44:54 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 19 Dec 2025 23:44:54 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v3] In-Reply-To: References: Message-ID: <0jsb7vKbh77MzBuNJMhfyYKQjXtSSQS2aS6uA_BiEWk=.4a3f5fa0-64ee-46d5-9f5d-759f226356b1@github.com> On Fri, 19 Dec 2025 13:15:38 GMT, Roland Westrelin wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Let me run testing again before I approve final version. 2nd round of testing passed, but it was on an earlier version. ------------- PR Review: https://git.openjdk.org/jdk/pull/28769#pullrequestreview-3600464325 From dlong at openjdk.org Fri Dec 19 23:44:56 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 19 Dec 2025 23:44:56 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v2] In-Reply-To: References: <_x8OV9zV3TaggMMwjNpjaeBtQRIxXbmq-VvV10SmTHg=.b7a98f3f-19c6-442b-b71b-a86a5b6f685a@github.com> Message-ID: <5-o26FUaB5buQhV_DKDKzF0oFJO8nS2AGOtahIfHVOY=.f961b6aa-efc2-4786-8016-249635c93d0f@github.com> On Fri, 19 Dec 2025 13:16:21 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/memnode.cpp line 2570: >> >>> 2568: assert(tkls2->offset() == 0, "not a load of java_mirror"); >>> 2569: #endif >>> 2570: return adr2->in(AddPNode::Address); >> >> What should the value of adr2->in(AddPNode::Offset) be at this point? 0 or java_mirror_offset()? Do we need to check it? > > I added a couple asserts here. If adr2->in(AddPNode::Address) is also an AddP, it's actually adr2->in(AddPNode::Address)->in(AddPNode::Offset) I'm concerned about, but maybe checking it with something like Ideal_base_and_offset() is probably overkill given the TypeKlassPtr::offset() should be derived from that info. In other words, the offset from the type should combine any non-zero offsets from Address and Offset edges. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2636591633 From dlong at openjdk.org Sat Dec 20 00:13:00 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 20 Dec 2025 00:13:00 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition In-Reply-To: References: Message-ID: <4I5OyCFNOcUpeRTpzH-_iUI88ux636JTaXznNTtoFPI=.4af5639b-e129-4352-bc1d-1f0415fb3416@github.com> On Fri, 19 Dec 2025 04:00:31 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the incorrect early return condition in `C2_MacroAssembler::verify_int_in_range`. Previously, `lo == min_jint && hi == max_jint` is equivalent to `t == Type::INT`. But this is not true anymore. > > Please take a look and leave your reviews, thanks a lot. The simplest solution is to just return instead of ShouldNotReachHere() when this condition is detected, because as you said, these functions do not have complete coverage of TypeInt properties. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28916#issuecomment-3677057230 From dlong at openjdk.org Sat Dec 20 01:42:56 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 20 Dec 2025 01:42:56 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph [v2] In-Reply-To: References: Message-ID: <72DTyUKDKlyEKHeDIYF5GUV6u__CGKT5LDAKglL0s6M=.4eaf8397-02d7-48b0-9261-276ffeb236ed@github.com> On Wed, 17 Dec 2025 14:22:57 GMT, Roland Westrelin wrote: >> A `CreateEx` gets sunk out of loop by >> `PhaseIdealLoop::try_sink_out_of_loop()` and, as a consequence, the >> following logic: >> >> >> return (in(0)->is_CatchProj() && in(0)->in(0)->is_Catch() && >> in(0)->in(0)->in(1) == in(1)) ? this : call->in(TypeFunc::Parms); >> >> >> in `CreateExNode::Identity()` triggers which leads to the crash >> because `call->in(TypeFunc::Parms)` is not even an object in this >> particular case. >> >> It's actually not clear to me what that logic in >> `CreateExNode::Identity()` is expected to do and I wonder if it's >> still needed. >> >> Anyway, the fix I propose is to skip `CreateEx` in >> `PhaseIdealLoop::try_sink_out_of_loop()`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java > > Co-authored-by: Christian Hagedorn I'll running Oracle testing before approving. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28842#issuecomment-3677195281 From dlong at openjdk.org Sat Dec 20 01:42:53 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 20 Dec 2025 01:42:53 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph In-Reply-To: <4eUiFfzqfjvPqmmbFqVtX3LCTKhePamkSWzj9WaCjKM=.d0de25a9-50b6-4c30-bceb-5f6152938903@github.com> References: <4eUiFfzqfjvPqmmbFqVtX3LCTKhePamkSWzj9WaCjKM=.d0de25a9-50b6-4c30-bceb-5f6152938903@github.com> Message-ID: On Fri, 19 Dec 2025 13:21:16 GMT, Roland Westrelin wrote: > `CreateEx` doesn't do anything. So it seems to me the risk of breaking something by accident in some uncommon case is not worth the risk and I would go with a conservative fix. Actually, it does a little something. It makes sure the exception object is associated with a specific register, RAX for x64, but I see your point about being conservative with the fix. We wouldn't want any instructions scheduled between it and the exception handler that could kill RAX. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28842#issuecomment-3677194373 From qamai at openjdk.org Sat Dec 20 02:54:49 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 20 Dec 2025 02:54:49 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition In-Reply-To: <7rdk9HOrSqPUhdeVKWdWclTuVd0IbqaGwhA1nYMxcGo=.f25abade-4c2e-4135-b7e7-1d0dbcc0e7d0@github.com> References: <7rdk9HOrSqPUhdeVKWdWclTuVd0IbqaGwhA1nYMxcGo=.f25abade-4c2e-4135-b7e7-1d0dbcc0e7d0@github.com> Message-ID: On Fri, 19 Dec 2025 22:48:30 GMT, Dean Long wrote: > What about leaving the early return for TypeInt::INT only (no work to be done), but add known bits checks. Then there will still be something to check when there are no range checks to do (lo == min_jlong && hi == max_jlong)? Yes, the intention is to add verification for other properties next. In the current PR, I want to solve the potential crash first. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28916#issuecomment-3677298785 From qamai at openjdk.org Sat Dec 20 03:11:40 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 20 Dec 2025 03:11:40 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition [v2] In-Reply-To: References: Message-ID: > Hi, > > This PR fixes the incorrect early return condition in `C2_MacroAssembler::verify_int_in_range`. Previously, `lo == min_jint && hi == max_jint` is equivalent to `t == Type::INT`. But this is not true anymore. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: refactor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28916/files - new: https://git.openjdk.org/jdk/pull/28916/files/a0e1c699..e7aee2c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28916&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28916&range=00-01 Stats: 61 lines in 2 files changed: 17 ins; 26 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/28916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28916/head:pull/28916 PR: https://git.openjdk.org/jdk/pull/28916 From qamai at openjdk.org Sat Dec 20 03:11:41 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 20 Dec 2025 03:11:41 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 04:00:31 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the incorrect early return condition in `C2_MacroAssembler::verify_int_in_range`. Previously, `lo == min_jint && hi == max_jint` is equivalent to `t == Type::INT`. But this is not true anymore. > > Please take a look and leave your reviews, thanks a lot. I have reworked the code a little bit so that it avoids crashing, and it also allows smooth addition of other properties. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28916#issuecomment-3677309611 From dlong at openjdk.org Sat Dec 20 03:33:51 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 20 Dec 2025 03:33:51 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v3] In-Reply-To: References: Message-ID: <-GY9oBe-WRSh16Yi90rp79Xxi784nRtvdqBlMh4TiMs=.ba972df0-9298-4f94-8699-597601bd39ba@github.com> On Fri, 19 Dec 2025 13:15:38 GMT, Roland Westrelin wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Tests passed. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28769#pullrequestreview-3600682440 From dlong at openjdk.org Sat Dec 20 03:43:58 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 20 Dec 2025 03:43:58 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition [v2] In-Reply-To: References: Message-ID: On Sat, 20 Dec 2025 03:11:40 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the incorrect early return condition in `C2_MacroAssembler::verify_int_in_range`. Previously, `lo == min_jint && hi == max_jint` is equivalent to `t == Type::INT`. But this is not true anymore. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > refactor Looks good. I'll run testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28916#issuecomment-3677333567 From dlong at openjdk.org Sat Dec 20 08:01:26 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 20 Dec 2025 08:01:26 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph [v2] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 14:22:57 GMT, Roland Westrelin wrote: >> A `CreateEx` gets sunk out of loop by >> `PhaseIdealLoop::try_sink_out_of_loop()` and, as a consequence, the >> following logic: >> >> >> return (in(0)->is_CatchProj() && in(0)->in(0)->is_Catch() && >> in(0)->in(0)->in(1) == in(1)) ? this : call->in(TypeFunc::Parms); >> >> >> in `CreateExNode::Identity()` triggers which leads to the crash >> because `call->in(TypeFunc::Parms)` is not even an object in this >> particular case. >> >> It's actually not clear to me what that logic in >> `CreateExNode::Identity()` is expected to do and I wonder if it's >> still needed. >> >> Anyway, the fix I propose is to skip `CreateEx` in >> `PhaseIdealLoop::try_sink_out_of_loop()`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java > > Co-authored-by: Christian Hagedorn Testing passed. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28842#pullrequestreview-3600849875 From dlong at openjdk.org Sat Dec 20 08:03:00 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 20 Dec 2025 08:03:00 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition [v2] In-Reply-To: References: Message-ID: On Sat, 20 Dec 2025 03:11:40 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the incorrect early return condition in `C2_MacroAssembler::verify_int_in_range`. Previously, `lo == min_jint && hi == max_jint` is equivalent to `t == Type::INT`. But this is not true anymore. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > refactor Testing passed. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28916#pullrequestreview-3600850518 From aph at openjdk.org Sat Dec 20 09:54:50 2025 From: aph at openjdk.org (Andrew Haley) Date: Sat, 20 Dec 2025 09:54:50 GMT Subject: [jdk26] RFR: 8373630: r18_tls should not be modified on Windows AArch64 In-Reply-To: <9ni--xTIR5XFH45ukPLDEiCn_XT_XZ02EmMUddoxAMI=.5a115f6c-93e8-4276-a099-f2de589fe8d5@github.com> References: <9ni--xTIR5XFH45ukPLDEiCn_XT_XZ02EmMUddoxAMI=.5a115f6c-93e8-4276-a099-f2de589fe8d5@github.com> Message-ID: On Fri, 19 Dec 2025 19:23:09 GMT, Saint Wesonga wrote: > This pull request is a backport of commit [a0dd66f9](https://github.com/openjdk/jdk/commit/a0dd66f92d7f8400b9800847e36d036315628afb) Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28933#pullrequestreview-3600940979 From duke at openjdk.org Sat Dec 20 16:43:55 2025 From: duke at openjdk.org (duke) Date: Sat, 20 Dec 2025 16:43:55 GMT Subject: [jdk26] RFR: 8373630: r18_tls should not be modified on Windows AArch64 In-Reply-To: <9ni--xTIR5XFH45ukPLDEiCn_XT_XZ02EmMUddoxAMI=.5a115f6c-93e8-4276-a099-f2de589fe8d5@github.com> References: <9ni--xTIR5XFH45ukPLDEiCn_XT_XZ02EmMUddoxAMI=.5a115f6c-93e8-4276-a099-f2de589fe8d5@github.com> Message-ID: On Fri, 19 Dec 2025 19:23:09 GMT, Saint Wesonga wrote: > This pull request is a backport of commit [a0dd66f9](https://github.com/openjdk/jdk/commit/a0dd66f92d7f8400b9800847e36d036315628afb) @swesonga Your change (at version a23ee3671faad633f004c86f19f6aceb8dd5a021) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28933#issuecomment-3677946496 From swesonga at openjdk.org Sat Dec 20 18:18:59 2025 From: swesonga at openjdk.org (Saint Wesonga) Date: Sat, 20 Dec 2025 18:18:59 GMT Subject: [jdk26] Integrated: 8373630: r18_tls should not be modified on Windows AArch64 In-Reply-To: <9ni--xTIR5XFH45ukPLDEiCn_XT_XZ02EmMUddoxAMI=.5a115f6c-93e8-4276-a099-f2de589fe8d5@github.com> References: <9ni--xTIR5XFH45ukPLDEiCn_XT_XZ02EmMUddoxAMI=.5a115f6c-93e8-4276-a099-f2de589fe8d5@github.com> Message-ID: On Fri, 19 Dec 2025 19:23:09 GMT, Saint Wesonga wrote: > This pull request is a backport of commit [a0dd66f9](https://github.com/openjdk/jdk/commit/a0dd66f92d7f8400b9800847e36d036315628afb) This pull request has now been integrated. Changeset: 1ec4ff54 Author: Saint Wesonga Committer: Andrew Haley URL: https://git.openjdk.org/jdk/commit/1ec4ff54ae2b041f058af9d3a9e927dfc27d71bc Stats: 23 lines in 1 file changed: 23 ins; 0 del; 0 mod 8373630: r18_tls should not be modified on Windows AArch64 Reviewed-by: aph Backport-of: a0dd66f92d7f8400b9800847e36d036315628afb ------------- PR: https://git.openjdk.org/jdk/pull/28933 From kvn at openjdk.org Sun Dec 21 01:57:14 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 21 Dec 2025 01:57:14 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition [v2] In-Reply-To: References: Message-ID: On Sat, 20 Dec 2025 03:11:40 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the incorrect early return condition in `C2_MacroAssembler::verify_int_in_range`. Previously, `lo == min_jint && hi == max_jint` is equivalent to `t == Type::INT`. But this is not true anymore. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > refactor I prefer this last version. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28916#pullrequestreview-3601415870 From qamai at openjdk.org Sun Dec 21 15:19:05 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 21 Dec 2025 15:19:05 GMT Subject: RFR: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition [v2] In-Reply-To: References: Message-ID: On Sat, 20 Dec 2025 03:11:40 GMT, Quan Anh Mai wrote: >> Hi, >> >> This PR fixes the incorrect early return condition in `C2_MacroAssembler::verify_int_in_range`. Previously, `lo == min_jint && hi == max_jint` is equivalent to `t == Type::INT`. But this is not true anymore. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > refactor Thanks a lot for your reviews and testing ------------- PR Comment: https://git.openjdk.org/jdk/pull/28916#issuecomment-3678880542 From qamai at openjdk.org Sun Dec 21 15:19:06 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 21 Dec 2025 15:19:06 GMT Subject: Integrated: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition In-Reply-To: References: Message-ID: <3APboMW_xShegO4K6sLH-cTbH3TgrzFiHe2NMmN7rb0=.c6d2d44a-cec4-4fc3-b758-4eff4565d603@github.com> On Fri, 19 Dec 2025 04:00:31 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the incorrect early return condition in `C2_MacroAssembler::verify_int_in_range`. Previously, `lo == min_jint && hi == max_jint` is equivalent to `t == Type::INT`. But this is not true anymore. > > Please take a look and leave your reviews, thanks a lot. This pull request has now been integrated. Changeset: 8ab7d3b8 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/8ab7d3b89f656e5c2882e19065f01fcc434161d2 Stats: 47 lines in 2 files changed: 11 ins; 22 del; 14 mod 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/28916 From qamai at openjdk.org Sun Dec 21 15:28:37 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 21 Dec 2025 15:28:37 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v3] In-Reply-To: References: Message-ID: <7w_PKSUGvDqVsVCTdv35mHvPhRtFID24Pp0lK33R8Ts=.5472ad81-2547-4c93-b04e-523a6968a534@github.com> > Hi, > > This PR improves the implementation of `AddNode/SubNode::Value` by taking advantage of the additional information in `TypeInt`. The implementation has some pretty non-trivial logic. Fortunately, the test infrastructure is already there. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into addsub - include order - Improve Add/SubNode::Value with unsigned bounds and known bits ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28897/files - new: https://git.openjdk.org/jdk/pull/28897/files/f910e70b..a0ff1f67 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28897&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28897&range=01-02 Stats: 3134 lines in 117 files changed: 1954 ins; 560 del; 620 mod Patch: https://git.openjdk.org/jdk/pull/28897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28897/head:pull/28897 PR: https://git.openjdk.org/jdk/pull/28897 From galder at openjdk.org Mon Dec 22 06:01:52 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 22 Dec 2025 06:01:52 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v4] In-Reply-To: References: Message-ID: On Tue, 16 Dec 2025 11:56:40 GMT, Jatin Bhateja wrote: >> Emulate multiplier using LEA addressing scheme, where effective address = BASE + INDEX * SCALE + OFFSET >> Refer to section "3.5.1.2 Using LEA" of Intel's optimization manual for details reagarding slow vs fast lea instructions. >> Given that latency of IMUL with register operands is 3 cycles, a combination of two fast LEAs each with 1 cycle latency to emulate multipler is performant. >> >> Consider X as the multiplicand, by variying the scale of first LEA instruction we can generate 4 input i.e. >> >> >> X + X * 1 = 2X >> X + X * 2 = 3X >> X + X * 4 = 5X >> X + X * 8 = 9X >> >> >> Following table list downs various multiplier combinations for output of first LEA at BASE and/or INDEX by varying the >> scale of second fast LEA instruction. We will only handle the cases which cannot be handled by just shift + add. >> >> >> BASE INDEX SCALE MULTIPLER >> X X 1 2 (Terminal) >> X X 2 3 (Terminal) >> X X 4 5 (Terminal) >> X X 8 9 (Terminal) >> 3X 3X 1 6 >> X 3X 2 7 >> 5X 5X 1 10 >> X 5X 2 11 >> X 3X 4 13 >> 5X 5X 2 15 >> X 2X 8 17 >> 9X 9X 1 18 >> X 9X 2 19 >> X 5X 4 21 >> 5X 5X 4 25 >> 9X 9X 2 27 >> X 9X 4 37 >> X 5X 8 41 >> 9X 9X 4 45 >> X 9X 8 73 >> 9X 9X 8 81 >> >> >> All the non-unity inputs tied to BASE / INDEX are derived out of terminal cases which represent first FAST LEA. Thus, all the multipliers can be computed using just two LEA instructions. >> >> Performance numbers for new micro benchmark included with this patch shows around **5-50% improvments** on latest x86 servers. >> >> >> System: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz Emerald Rapids:- >> Baseline:- >> Benchmark Mode Cnt Score Error Units >> ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 189.690 ops/min >> ConstantMultiplierOptimization.testConstMultiplierL thrpt 2 196.283 ops/min >> >> >> Withopt:- >> Benchmark Mode Cnt Score Error Units >> Constant... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Minor cleanup in Template-Framework test Nice improvement @jatin-bhateja, just some small comments Matt did an [Advent of Compiler Optimizations video](https://youtu.be/1X88od0miHs?si=wlYCsbZ1vmJA_rVf) precisely on this topic recently :) test/hotspot/jtreg/compiler/c2/TestConstantMultiplier.java line 62: > 60: > 61: // Add a java source file. > 62: comp.addJavaSourceCode("c2.compilerr.ConstantMultiplierTest", generate(comp)); Suggestion: comp.addJavaSourceCode("c2.compiler.ConstantMultiplierTest", generate(comp)); Package name spelling mistake. test/micro/org/openjdk/bench/vm/compiler/ConstantMultiplierOptimization.java line 46: > 44: public class ConstantMultiplierOptimization { > 45: > 46: public static int mul_by_25_I(int a) { Should this and other methods be annotated with `@ForceInline` just in case? ------------- Changes requested by galder (Author). PR Review: https://git.openjdk.org/jdk/pull/28759#pullrequestreview-3602769564 PR Review Comment: https://git.openjdk.org/jdk/pull/28759#discussion_r2638758416 PR Review Comment: https://git.openjdk.org/jdk/pull/28759#discussion_r2638761774 From chagedorn at openjdk.org Mon Dec 22 06:42:00 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Dec 2025 06:42:00 GMT Subject: RFR: 8373508: C2: sinking CreateEx out of loop breaks the graph [v2] In-Reply-To: References: Message-ID: On Wed, 17 Dec 2025 14:22:57 GMT, Roland Westrelin wrote: >> A `CreateEx` gets sunk out of loop by >> `PhaseIdealLoop::try_sink_out_of_loop()` and, as a consequence, the >> following logic: >> >> >> return (in(0)->is_CatchProj() && in(0)->in(0)->is_Catch() && >> in(0)->in(0)->in(1) == in(1)) ? this : call->in(TypeFunc::Parms); >> >> >> in `CreateExNode::Identity()` triggers which leads to the crash >> because `call->in(TypeFunc::Parms)` is not even an object in this >> particular case. >> >> It's actually not clear to me what that logic in >> `CreateExNode::Identity()` is expected to do and I wonder if it's >> still needed. >> >> Anyway, the fix I propose is to skip `CreateEx` in >> `PhaseIdealLoop::try_sink_out_of_loop()`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopopts/TestCreateExSunkOutOfLoop.java > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28842#pullrequestreview-3602846373 From jbhateja at openjdk.org Mon Dec 22 07:23:51 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 22 Dec 2025 07:23:51 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v5] In-Reply-To: References: Message-ID: > Emulate multiplier using LEA addressing scheme, where effective address = BASE + INDEX * SCALE + OFFSET > Refer to section "3.5.1.2 Using LEA" of Intel's optimization manual for details reagarding slow vs fast lea instructions. > Given that latency of IMUL with register operands is 3 cycles, a combination of two fast LEAs each with 1 cycle latency to emulate multipler is performant. > > Consider X as the multiplicand, by variying the scale of first LEA instruction we can generate 4 input i.e. > > > X + X * 1 = 2X > X + X * 2 = 3X > X + X * 4 = 5X > X + X * 8 = 9X > > > Following table list downs various multiplier combinations for output of first LEA at BASE and/or INDEX by varying the > scale of second fast LEA instruction. We will only handle the cases which cannot be handled by just shift + add. > > > BASE INDEX SCALE MULTIPLER > X X 1 2 (Terminal) > X X 2 3 (Terminal) > X X 4 5 (Terminal) > X X 8 9 (Terminal) > 3X 3X 1 6 > X 3X 2 7 > 5X 5X 1 10 > X 5X 2 11 > X 3X 4 13 > 5X 5X 2 15 > X 2X 8 17 > 9X 9X 1 18 > X 9X 2 19 > X 5X 4 21 > 5X 5X 4 25 > 9X 9X 2 27 > X 9X 4 37 > X 5X 8 41 > 9X 9X 4 45 > X 9X 8 73 > 9X 9X 8 81 > > > All the non-unity inputs tied to BASE / INDEX are derived out of terminal cases which represent first FAST LEA. Thus, all the multipliers can be computed using just two LEA instructions. > > Performance numbers for new micro benchmark included with this patch shows around **5-50% improvments** on latest x86 servers. > > > System: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz Emerald Rapids:- > Baseline:- > Benchmark Mode Cnt Score Error Units > ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 189.690 ops/min > ConstantMultiplierOptimization.testConstMultiplierL thrpt 2 196.283 ops/min > > > Withopt:- > Benchmark Mode Cnt Score Error Units > ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 283.827 ops/min > ConstantMultiplierOptimization... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28759/files - new: https://git.openjdk.org/jdk/pull/28759/files/66a28502..8e42a466 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28759&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28759&range=03-04 Stats: 17 lines in 2 files changed: 13 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28759/head:pull/28759 PR: https://git.openjdk.org/jdk/pull/28759 From jbhateja at openjdk.org Mon Dec 22 07:28:06 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 22 Dec 2025 07:28:06 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v4] In-Reply-To: References: Message-ID: On Mon, 22 Dec 2025 05:59:14 GMT, Galder Zamarre?o wrote: > Nice improvement @jatin-bhateja, just some small comments > > Matt did an [Advent of Compiler Optimizations video](https://youtu.be/1X88od0miHs?si=wlYCsbZ1vmJA_rVf) precisely on this topic recently :) Hi @galderz , Exaclty, JBS description clearly mentions that. > test/micro/org/openjdk/bench/vm/compiler/ConstantMultiplierOptimization.java line 46: > >> 44: public class ConstantMultiplierOptimization { >> 45: >> 46: public static int mul_by_25_I(int a) { > > Should this and other methods be annotated with `@ForceInline` just in case? Kernels are too small and should be inlined, but its better to force inlining, made that change, benchmark performance remains same. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28759#issuecomment-3680823623 PR Review Comment: https://git.openjdk.org/jdk/pull/28759#discussion_r2638921837 From chagedorn at openjdk.org Mon Dec 22 07:33:57 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Dec 2025 07:33:57 GMT Subject: RFR: 8373343: C2: verify AddP base input only set for heap addresses [v3] In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 13:15:38 GMT, Roland Westrelin wrote: >> The base input of `AddP` is expected to only be set for heap accesses >> but I noticed some inconsistencies so I added an assert in the `AddP` >> constructor and fixed issues that it caught. AFAFICT, the >> inconsistencies shouldn't create issues. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Good improvement! I have two more suggestions but these could also be done separately. src/hotspot/share/opto/graphKit.cpp line 2737: > 2735: > 2736: // First load the super-klass's check-offset > 2737: Node *p1 = gvn.transform(new AddPNode(C->top(), superklass, gvn.MakeConX(in_bytes(Klass::super_check_offset_offset())))); Would it make sense to have 2 constructors or even better 2 `make()` methods with dedicated names instead of passing in top each time? For example: make_with_base(Node* base, Node* ptr, Node* offset) make_off_heap(Node* ptr, Node* offset) `make_off_heap()` can then call `make_with_base(Compile::current()->top, ptr, offset)` src/hotspot/share/opto/graphKit.cpp line 2755: > 2753: chk_off_X = gvn.transform(new ConvI2LNode(chk_off_X)); > 2754: #endif > 2755: Node *p2 = gvn.transform(new AddPNode(C->top(),subklass,chk_off_X)); Suggestion: Node* p2 = gvn.transform(new AddPNode(C->top(), subklass, chk_off_X)); src/hotspot/share/opto/graphKit.cpp line 3590: > 3588: } > 3589: constant_value = Klass::_lh_neutral_value; // put in a known value > 3590: Node* lhp = basic_plus_adr(top(), klass_node, in_bytes(Klass::layout_helper_offset())); Same thought here: could we have a separate `off_heap_plus_addr()` or something like that instead of passing in `top()` on each call site? This and the other suggestion could also be done separately. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28769#pullrequestreview-3602937227 PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2638913566 PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2638921027 PR Review Comment: https://git.openjdk.org/jdk/pull/28769#discussion_r2638920830 From dfenacci at openjdk.org Mon Dec 22 08:02:46 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 22 Dec 2025 08:02:46 GMT Subject: RFR: 8373525: C2: assert(_base == Long) failed: Not a Long [v2] In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 16:17:58 GMT, Christian Hagedorn wrote: > Otherwise, looks good to me, too, thanks! Thanks for your suggestions @chhagedorn. I've just addressed them. > src/hotspot/share/opto/addnode.cpp line 1603: > >> 1601: const TypeLong* x_long = phase->type(x)->isa_long(); >> 1602: // Collapsed graph not equivalent if potential over/underflow -> bailing out (*) >> 1603: if (x_long == nullptr || can_overflow(x_long, con1->get_long() + con2->get_long())) { > > I suggest to to add a comment about when `x_long` is not a long as described in the PR description. I added a comment a few lines above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28920#issuecomment-3680912227 PR Review Comment: https://git.openjdk.org/jdk/pull/28920#discussion_r2638990557 From dfenacci at openjdk.org Mon Dec 22 08:02:43 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 22 Dec 2025 08:02:43 GMT Subject: RFR: 8373525: C2: assert(_base == Long) failed: Not a Long [v3] In-Reply-To: References: Message-ID: <7gVCe2XWzP26QSSvLEEkJONV5yx1EOmpxGGIf-BLI_o=.09403eb0-02de-4ca7-a13f-c01acb768d65@github.com> > # Issue > Olivier's fuzzer found a test that makes C2 crash while running the optimization that collapses the addition with overflow-protection (`fold_subI_no_underflow_pattern`). > > # Causes > The crash happens because during `fold_subI_no_underflow_pattern` the first input of the `AddL` node (see comment below) becomes top. > https://github.com/openjdk/jdk/blob/82b04f01bc99e8155518b8b8600d180981a42fc5/src/hotspot/share/opto/addnode.cpp#L1525-L1533 > > This happens because of a whole `IfFalse` subgraph that dies and nodes are being removed. `AddL` is not removed immediately as it has another input which is still alive but it is put in the IGVN worklist instead. > > image > > Unfortunately the `fold_subI_no_underflow_pattern` optimization runs before the next GVN pass and triggers the assert. > > # Fix > `fold_subI_no_underflow_pattern` should actually take into account that we could have the graph in such a state and that `x` could be top. So, the sensible fix is not to presume `x` to be of type long and bailout if it is not. > > # Testing > Tier 1-3+ > (also checked for new regression test failure before the change) Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - Apply suggestion from @chhagedorn Co-authored-by: Christian Hagedorn - JDK-8373525: add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28920/files - new: https://git.openjdk.org/jdk/pull/28920/files/3219ce4f..57f15854 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28920&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28920&range=01-02 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28920/head:pull/28920 PR: https://git.openjdk.org/jdk/pull/28920 From chagedorn at openjdk.org Mon Dec 22 09:09:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Dec 2025 09:09:53 GMT Subject: RFR: 8373525: C2: assert(_base == Long) failed: Not a Long [v3] In-Reply-To: <7gVCe2XWzP26QSSvLEEkJONV5yx1EOmpxGGIf-BLI_o=.09403eb0-02de-4ca7-a13f-c01acb768d65@github.com> References: <7gVCe2XWzP26QSSvLEEkJONV5yx1EOmpxGGIf-BLI_o=.09403eb0-02de-4ca7-a13f-c01acb768d65@github.com> Message-ID: On Mon, 22 Dec 2025 08:02:43 GMT, Damon Fenacci wrote: >> # Issue >> Olivier's fuzzer found a test that makes C2 crash while running the optimization that collapses the addition with overflow-protection (`fold_subI_no_underflow_pattern`). >> >> # Causes >> The crash happens because during `fold_subI_no_underflow_pattern` the first input of the `AddL` node (see comment below) becomes top. >> https://github.com/openjdk/jdk/blob/82b04f01bc99e8155518b8b8600d180981a42fc5/src/hotspot/share/opto/addnode.cpp#L1525-L1533 >> >> This happens because of a whole `IfFalse` subgraph that dies and nodes are being removed. `AddL` is not removed immediately as it has another input which is still alive but it is put in the IGVN worklist instead. >> >> image >> >> Unfortunately the `fold_subI_no_underflow_pattern` optimization runs before the next GVN pass and triggers the assert. >> >> # Fix >> `fold_subI_no_underflow_pattern` should actually take into account that we could have the graph in such a state and that `x` could be top. So, the sensible fix is not to presume `x` to be of type long and bailout if it is not. >> >> # Testing >> Tier 1-3+ >> (also checked for new regression test failure before the change) > > Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: > > - Apply suggestion from @chhagedorn > > Co-authored-by: Christian Hagedorn > - JDK-8373525: add comment src/hotspot/share/opto/addnode.cpp line 1601: > 1599: Node* con2 = add2->in(2); > 1600: if (is_sub_con(con2)) { > 1601: // The graph could be in a dirty state. So, we need to check for the type of x I suggest to be more explicit: Suggestion: // The graph could be dying (i.e. x is top) in which case type(x) is not a long. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28920#discussion_r2639167385 From dfenacci at openjdk.org Mon Dec 22 09:42:48 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 22 Dec 2025 09:42:48 GMT Subject: RFR: 8373525: C2: assert(_base == Long) failed: Not a Long [v4] In-Reply-To: References: Message-ID: > # Issue > Olivier's fuzzer found a test that makes C2 crash while running the optimization that collapses the addition with overflow-protection (`fold_subI_no_underflow_pattern`). > > # Causes > The crash happens because during `fold_subI_no_underflow_pattern` the first input of the `AddL` node (see comment below) becomes top. > https://github.com/openjdk/jdk/blob/82b04f01bc99e8155518b8b8600d180981a42fc5/src/hotspot/share/opto/addnode.cpp#L1525-L1533 > > This happens because of a whole `IfFalse` subgraph that dies and nodes are being removed. `AddL` is not removed immediately as it has another input which is still alive but it is put in the IGVN worklist instead. > > image > > Unfortunately the `fold_subI_no_underflow_pattern` optimization runs before the next GVN pass and triggers the assert. > > # Fix > `fold_subI_no_underflow_pattern` should actually take into account that we could have the graph in such a state and that `x` could be top. So, the sensible fix is not to presume `x` to be of type long and bailout if it is not. > > # Testing > Tier 1-3+ > (also checked for new regression test failure before the change) Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/addnode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28920/files - new: https://git.openjdk.org/jdk/pull/28920/files/57f15854..a5e8eb6d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28920&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28920&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28920/head:pull/28920 PR: https://git.openjdk.org/jdk/pull/28920 From galder at openjdk.org Mon Dec 22 09:43:02 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 22 Dec 2025 09:43:02 GMT Subject: RFR: 8373344: Add support for min/max reduction operations for Float16 In-Reply-To: <7qXqIBuLDFEKPNje6TALxZUATujnOY5hoODC30zJNFM=.07d8d157-1ae5-4751-befd-d6291370fb9c@github.com> References: <7qXqIBuLDFEKPNje6TALxZUATujnOY5hoODC30zJNFM=.07d8d157-1ae5-4751-befd-d6291370fb9c@github.com> Message-ID: On Mon, 15 Dec 2025 15:51:32 GMT, Yi Wu wrote: > This patch adds mid-end support for vectorized min/max reduction operations for half floats. It also includes backend AArch64 support for these operations. > Both floating point min/max reductions don?t require strict order, because they are associative. > > It will generate NEON fminv/fmaxv reduction instructions when max vector length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it will generate the SVE fminv/fmaxv instructions. > The patch also adds support for partial min/max reductions on SVE machines using fminv/fmaxv. > > Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is better than the mainline. > > Neoverse N1 (UseSVE = 0, max vector length = 16B): > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionMaxFP16 256 thrpt 9 3.69 6.44 > ReductionMaxFP16 512 thrpt 9 3.71 7.62 > ReductionMaxFP16 1024 thrpt 9 4.16 8.64 > ReductionMaxFP16 2048 thrpt 9 4.44 9.12 > ReductionMinFP16 256 thrpt 9 3.69 6.43 > ReductionMinFP16 512 thrpt 9 3.70 7.62 > ReductionMinFP16 1024 thrpt 9 4.16 8.64 > ReductionMinFP16 2048 thrpt 9 4.44 9.10 > > > Neoverse V1 (UseSVE = 1, max vector length = 32B): > > Benchmark vectorDim Mode Cnt 8B 16B 32B > ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 > ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 > ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 > ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 > ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 > ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 > ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 > ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 > > > Neoverse V2 (UseSVE = 2, max vector length = 16B): > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionMaxFP16 256 thrpt 9 4.78 10.00 > ReductionMaxFP16 512 thrpt 9 3.74 11.33 > ReductionMaxFP16 1024 thrpt 9 3.86 9.59 > ReductionMaxFP16 2048 thrpt 9 3.94 8.71 > ReductionMinFP16 256 thrpt 9 4.78 10.00 > ReductionMinFP16 512 thrpt 9 3.74 11.29 > ReductionMinFP16 1024 thrpt 9 3.86 9.58 > ReductionMinFP16 2048 thrpt 9 3.94 8.71 > > > Testing: > hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on Neoverse N1/V1/V2. Thanks @yiwu0b11, some superficial comments test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 486: > 484: @Test > 485: @Warmup(500) > 486: @IR(counts = {"reduce_minHF_masked", " >0 "}, Could you add IRNode constants for `reduce_minHF_masked`? Also for the max version below test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java line 319: > 317: > 318: @Benchmark > 319: public short ReductionMinFP16() { Suggestion: public short reductionMinFP16() { test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java line 328: > 326: > 327: @Benchmark > 328: public short ReductionMaxFP16() { Suggestion: public short reductionMaxFP16() { ------------- Changes requested by galder (Author). PR Review: https://git.openjdk.org/jdk/pull/28828#pullrequestreview-3603354237 PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2639273162 PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2639270984 PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2639271426 From chagedorn at openjdk.org Mon Dec 22 10:05:51 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Dec 2025 10:05:51 GMT Subject: RFR: 8373525: C2: assert(_base == Long) failed: Not a Long [v4] In-Reply-To: References: Message-ID: On Mon, 22 Dec 2025 09:42:48 GMT, Damon Fenacci wrote: >> # Issue >> Olivier's fuzzer found a test that makes C2 crash while running the optimization that collapses the addition with overflow-protection (`fold_subI_no_underflow_pattern`). >> >> # Causes >> The crash happens because during `fold_subI_no_underflow_pattern` the first input of the `AddL` node (see comment below) becomes top. >> https://github.com/openjdk/jdk/blob/82b04f01bc99e8155518b8b8600d180981a42fc5/src/hotspot/share/opto/addnode.cpp#L1525-L1533 >> >> This happens because of a whole `IfFalse` subgraph that dies and nodes are being removed. `AddL` is not removed immediately as it has another input which is still alive but it is put in the IGVN worklist instead. >> >> image >> >> Unfortunately the `fold_subI_no_underflow_pattern` optimization runs before the next GVN pass and triggers the assert. >> >> # Fix >> `fold_subI_no_underflow_pattern` should actually take into account that we could have the graph in such a state and that `x` could be top. So, the sensible fix is not to presume `x` to be of type long and bailout if it is not. >> >> # Testing >> Tier 1-3+ >> (also checked for new regression test failure before the change) > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/addnode.cpp > > Co-authored-by: Christian Hagedorn Thanks for the update! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28920#pullrequestreview-3603432523 From chagedorn at openjdk.org Mon Dec 22 11:22:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Dec 2025 11:22:55 GMT Subject: [jdk26] RFR: 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph In-Reply-To: <_qsak5U1E4TBrhC0V15Lm5TmaDjIXFNwA93zyXxkXNI=.6234364a-a7fe-46af-a3b7-798c4ab45496@github.com> References: <_qsak5U1E4TBrhC0V15Lm5TmaDjIXFNwA93zyXxkXNI=.6234364a-a7fe-46af-a3b7-798c4ab45496@github.com> Message-ID: On Fri, 19 Dec 2025 14:55:33 GMT, Emanuel Peter wrote: > Clean backport of https://github.com/openjdk/jdk/pull/28783 to JDK26. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28929#pullrequestreview-3603675523 From jbhateja at openjdk.org Mon Dec 22 12:09:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 22 Dec 2025 12:09:01 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v6] In-Reply-To: References: Message-ID: > Emulate multiplier using LEA addressing scheme, where effective address = BASE + INDEX * SCALE + OFFSET > Refer to section "3.5.1.2 Using LEA" of Intel's optimization manual for details reagarding slow vs fast lea instructions. > Given that latency of IMUL with register operands is 3 cycles, a combination of two fast LEAs each with 1 cycle latency to emulate multipler is performant. > > Consider X as the multiplicand, by variying the scale of first LEA instruction we can generate 4 input i.e. > > > X + X * 1 = 2X > X + X * 2 = 3X > X + X * 4 = 5X > X + X * 8 = 9X > > > Following table list downs various multiplier combinations for output of first LEA at BASE and/or INDEX by varying the > scale of second fast LEA instruction. We will only handle the cases which cannot be handled by just shift + add. > > > BASE INDEX SCALE MULTIPLER > X X 1 2 (Terminal) > X X 2 3 (Terminal) > X X 4 5 (Terminal) > X X 8 9 (Terminal) > 3X 3X 1 6 > X 3X 2 7 > 5X 5X 1 10 > X 5X 2 11 > X 3X 4 13 > 5X 5X 2 15 > X 2X 8 17 > 9X 9X 1 18 > X 9X 2 19 > X 5X 4 21 > 5X 5X 4 25 > 9X 9X 2 27 > X 9X 4 37 > X 5X 8 41 > 9X 9X 4 45 > X 9X 8 73 > 9X 9X 8 81 > > > All the non-unity inputs tied to BASE / INDEX are derived out of terminal cases which represent first FAST LEA. Thus, all the multipliers can be computed using just two LEA instructions. > > Performance numbers for new micro benchmark included with this patch shows around **5-50% improvments** on latest x86 servers. > > > System: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz Emerald Rapids:- > Baseline:- > Benchmark Mode Cnt Score Error Units > ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 189.690 ops/min > ConstantMultiplierOptimization.testConstMultiplierL thrpt 2 196.283 ops/min > > > Withopt:- > Benchmark Mode Cnt Score Error Units > ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 283.827 ops/min > ConstantMultiplierOptimization... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Extending micro and jtreg tests for memory patterns ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28759/files - new: https://git.openjdk.org/jdk/pull/28759/files/8e42a466..b7756730 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28759&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28759&range=04-05 Stats: 154 lines in 5 files changed: 142 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/28759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28759/head:pull/28759 PR: https://git.openjdk.org/jdk/pull/28759 From qamai at openjdk.org Mon Dec 22 12:26:12 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 22 Dec 2025 12:26:12 GMT Subject: RFR: 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic Message-ID: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> Hi, The issue here is the inconsistency in computing the `_widen` field of the `TypeInt`. At the first step, the types of the operands are: t1 = int:0 t2 = int:-2..3, widen = 3 Since the type of the first operand is a constant zero, `AddNode::Value` returns the type of the second operand directly, as `x ^ 0 == x for all x`. In the second step, `t1` is widened to `0..2`. This triggers the real computation of the result. The algorithm then splits `t2` into `t21 = int:-2..-1` and `t22 = int:0..3`. The `Xor` of these with `t1` are `r1 = int:-4..-1` and `r2 = int:0..3`. As both have `_hi - _lo <= SMALL_TYPEINT_THRESHOLD == 3`, their `_widen`s are normalized to `0`. As a result, their `meet` also has `_widen == 0`. This value is smaller than that from the previous step, which was `3`, which leads to the failure. The root cause here is that, the `_widen` value of a node should be computed and normalized on the whole range of the node, not on its subranges, which may normalize it to `0` in more cases than what is expected. As a result, my proposed solution is to ignore the `_widen` value of the subranges, and pass the expected `_widen` value when composing the final result. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - RangeInference::infer should ensure correct value of _widen Changes: https://git.openjdk.org/jdk/pull/28952/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28952&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374180 Stats: 79 lines in 4 files changed: 65 ins; 1 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/28952.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28952/head:pull/28952 PR: https://git.openjdk.org/jdk/pull/28952 From dfenacci at openjdk.org Mon Dec 22 12:53:23 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 22 Dec 2025 12:53:23 GMT Subject: RFR: 8373525: C2: assert(_base == Long) failed: Not a Long [v2] In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 15:44:31 GMT, Manuel H?ssig wrote: >> Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/loopopts/TestValidTypeInOverflowProtection.java >> >> Co-authored-by: Manuel H?ssig >> - JDK-8373525: remove test requires > > Thanks for addressing my comments. Looks good to me. Thank you for your reviews @mhaessig @chhagedorn. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28920#issuecomment-3681970852 From dfenacci at openjdk.org Mon Dec 22 12:53:24 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 22 Dec 2025 12:53:24 GMT Subject: Integrated: 8373525: C2: assert(_base == Long) failed: Not a Long In-Reply-To: References: Message-ID: On Fri, 19 Dec 2025 10:22:58 GMT, Damon Fenacci wrote: > # Issue > Olivier's fuzzer found a test that makes C2 crash while running the optimization that collapses the addition with overflow-protection (`fold_subI_no_underflow_pattern`). > > # Causes > The crash happens because during `fold_subI_no_underflow_pattern` the first input of the `AddL` node (see comment below) becomes top. > https://github.com/openjdk/jdk/blob/82b04f01bc99e8155518b8b8600d180981a42fc5/src/hotspot/share/opto/addnode.cpp#L1525-L1533 > > This happens because of a whole `IfFalse` subgraph that dies and nodes are being removed. `AddL` is not removed immediately as it has another input which is still alive but it is put in the IGVN worklist instead. > > image > > Unfortunately the `fold_subI_no_underflow_pattern` optimization runs before the next GVN pass and triggers the assert. > > # Fix > `fold_subI_no_underflow_pattern` should actually take into account that we could have the graph in such a state and that `x` could be top. So, the sensible fix is not to presume `x` to be of type long and bailout if it is not. > > # Testing > Tier 1-3+ > (also checked for new regression test failure before the change) This pull request has now been integrated. Changeset: a61a1d32 Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/a61a1d32a2bbf227081b9da6d101071ceb73076a Stats: 117 lines in 2 files changed: 116 ins; 0 del; 1 mod 8373525: C2: assert(_base == Long) failed: Not a Long Reviewed-by: chagedorn, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/28920 From rcastanedalo at openjdk.org Mon Dec 22 13:03:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 22 Dec 2025 13:03:56 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v10] In-Reply-To: <71h5168GmX3c9kMMlizU_ueAqytUYmU1zwfmMsRCLEY=.96fa6ece-5804-40f2-83ef-5979650a449a@github.com> References: <71h5168GmX3c9kMMlizU_ueAqytUYmU1zwfmMsRCLEY=.96fa6ece-5804-40f2-83ef-5979650a449a@github.com> Message-ID: On Fri, 19 Dec 2025 16:53:31 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Much more comments, refactor the data into a separate class > > I have added a section describing some future work based on this PR that I have come up with. @merykitty would it be possible to guard the logic added by this patch with a new diagnostic flag, to facilitate reviewing and experimenting? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3682010714 From qamai at openjdk.org Mon Dec 22 13:14:33 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 22 Dec 2025 13:14:33 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v11] In-Reply-To: References: Message-ID: > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > ## The current PR: > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. > > I do not see a noticeable difference in C2 runtime with and without this patch. > > ## Future work: > > 1. Nested object: > > Consider this case: > > Holder h = new Holder(); > Object o = new Object(); > h.o = o; > > Currently, `o` will be considered escaped at `h.o = o`. However, it can be seen that `o` has not actually escaped because `h` has not escaped. Luckily, with the current approach, this can be easily achieved, notice how this loop is just "if anything escapes, consider `base` escapes", currently, the "anything" here includes `base` and its aliases. if we include the base of the object at which `o` is stored, then we can correctly determine if `o` has escaped. > > // Find all nodes that may escape alloc, and decide that it is provable that they must be > // executed after ctl > EscapeStatus res = NOT_ESCAPED; > aliases.push(base); > for (uint idx = 0; idx < aliases.size(); idx++) { > Node* n = aliases.at(idx); > > 2. Fold a memory `Phi`. > > This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. > > 3. Fold a pointer `Phi`. > > This can be easy, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: > > Point p1 = new Point; > Point p2 = new Point; > p1.x = v1; > p2.x = v2; > Point p = Phi(p1, p2); > int a = p.x; > > Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. > > Another i... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Add a flag to turn off the feature ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/0eb6e9fb..440b459c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=09-10 Stats: 5 lines in 2 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From qamai at openjdk.org Mon Dec 22 13:14:35 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 22 Dec 2025 13:14:35 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v10] In-Reply-To: References: <71h5168GmX3c9kMMlizU_ueAqytUYmU1zwfmMsRCLEY=.96fa6ece-5804-40f2-83ef-5979650a449a@github.com> Message-ID: On Mon, 22 Dec 2025 13:01:13 GMT, Roberto Casta?eda Lozano wrote: >> I have added a section describing some future work based on this PR that I have come up with. > > @merykitty would it be possible to guard the logic added by this patch with a new diagnostic flag, to facilitate reviewing and experimenting? @robcasloz Done, is it good for you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3682044605 From dfenacci at openjdk.org Mon Dec 22 13:26:27 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 22 Dec 2025 13:26:27 GMT Subject: [jdk26] RFR: 8373525: C2: assert(_base == Long) failed: Not a Long Message-ID: Hi all, This pull request contains a backport of commit [a61a1d32](https://github.com/openjdk/jdk/commit/a61a1d32a2bbf227081b9da6d101071ceb73076a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Damon Fenacci on 22 Dec 2025 and was reviewed by Christian Hagedorn and Manuel H?ssig. Thanks! ------------- Commit messages: - Backport a61a1d32a2bbf227081b9da6d101071ceb73076a Changes: https://git.openjdk.org/jdk/pull/28953/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28953&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373525 Stats: 117 lines in 2 files changed: 116 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28953.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28953/head:pull/28953 PR: https://git.openjdk.org/jdk/pull/28953 From chagedorn at openjdk.org Mon Dec 22 14:27:03 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Dec 2025 14:27:03 GMT Subject: [jdk26] RFR: 8373525: C2: assert(_base == Long) failed: Not a Long In-Reply-To: References: Message-ID: On Mon, 22 Dec 2025 13:19:04 GMT, Damon Fenacci wrote: > Hi all, > > This pull request contains a backport of commit [a61a1d32](https://github.com/openjdk/jdk/commit/a61a1d32a2bbf227081b9da6d101071ceb73076a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Damon Fenacci on 22 Dec 2025 and was reviewed by Christian Hagedorn and Manuel H?ssig. > > Thanks! Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28953#pullrequestreview-3604278238 From qamai at openjdk.org Mon Dec 22 14:37:43 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 22 Dec 2025 14:37:43 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v12] In-Reply-To: References: Message-ID: <1DZ6XLI2hNgtTUS-A5m4bQaroCjZ5wBJZ7nKN1w9abo=.edbf4907-422a-4362-968f-ef1319d6b30a@github.com> > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > ## The current PR: > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. > > I do not see a noticeable difference in C2 runtime with and without this patch. > > ## Future work: > > 1. Nested object: > > Consider this case: > > Holder h = new Holder(); > Object o = new Object(); > h.o = o; > > Currently, `o` will be considered escaped at `h.o = o`. However, it can be seen that `o` has not actually escaped because `h` has not escaped. Luckily, with the current approach, this can be easily achieved, notice how this loop is just "if anything escapes, consider `base` escapes", currently, the "anything" here includes `base` and its aliases. if we include the base of the object at which `o` is stored, then we can correctly determine if `o` has escaped. > > // Find all nodes that may escape alloc, and decide that it is provable that they must be > // executed after ctl > EscapeStatus res = NOT_ESCAPED; > aliases.push(base); > for (uint idx = 0; idx < aliases.size(); idx++) { > Node* n = aliases.at(idx); > > 2. Fold a memory `Phi`. > > This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. > > 3. Fold a pointer `Phi`. > > This can be easy, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: > > Point p1 = new Point; > Point p2 = new Point; > p1.x = v1; > p2.x = v2; > Point p = Phi(p1, p2); > int a = p.x; > > Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. > > Another i... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Add test scenarios ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/440b459c..74064ab8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=10-11 Stats: 14 lines in 2 files changed: 4 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From bkilambi at openjdk.org Mon Dec 22 16:28:45 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 22 Dec 2025 16:28:45 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v4] In-Reply-To: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: > This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - > > **For AddReduction :** > On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. > > **For MulReduction :** > Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. > > Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - > > Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the master branch. > > **N1 (UseSVE = 0, max vector length = 16B):** > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionAddFP16 256 thrpt 9 1.41 1.40 > ReductionAddFP16 512 thrpt 9 1.41 1.41 > ReductionAddFP16 1024 thrpt 9 1.43 1.40 > ReductionAddFP16 2048 thrpt 9 1.43 1.40 > ReductionMulFP16 256 thrpt 9 1.22 1.22 > ReductionMulFP16 512 thrpt 9 1.21 1.23 > ReductionMulFP16 1024 thrpt 9 1.21 1.22 > ReductionMulFP16 2048 thrpt 9 1.20 1.22 > > > On N1, the scalarized sequence of `fadd/fmul` are generated for both `MaxVectorSize` of 8B and 16B for add reduction ... Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27526/files - new: https://git.openjdk.org/jdk/pull/27526/files/620b422a..21ad1c93 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27526&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27526&range=02-03 Stats: 43 lines in 3 files changed: 16 ins; 5 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/27526.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27526/head:pull/27526 PR: https://git.openjdk.org/jdk/pull/27526 From bkilambi at openjdk.org Mon Dec 22 16:28:48 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 22 Dec 2025 16:28:48 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v4] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Fri, 19 Dec 2025 08:47:47 GMT, Xiaohong Gong wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 3457: > >> 3455: format %{ "reduce_addD_sve $dst_src1, $dst_src1, $src2" %} >> 3456: ins_encode %{ >> 3457: assert(UseSVE > 0, "must be sve"); > > Why do you remove this assertion? It was by mistake and it went past my notice. Thanks. I reverted this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2640448217 From bkilambi at openjdk.org Mon Dec 22 16:28:55 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 22 Dec 2025 16:28:55 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v2] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Fri, 19 Dec 2025 08:50:59 GMT, Xiaohong Gong wrote: >> Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Address review comments >> - Merge 'master' >> - 8366444: Add support for add/mul reduction operations for Float16 >> >> This patch adds mid-end support for vectorized add/mul reduction >> operations for half floats. It also includes backend aarch64 support for >> these operations. Only vectorization support through autovectorization >> is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate >> the implementation to be strictly ordered. The following is how each of >> these reductions is implemented for different aarch64 targets - >> >> For AddReduction : >> On Neon only targets (UseSVE = 0): Generates scalarized additions >> using the scalar "fadd" instruction for both 8B and 16B vector lengths. >> This is because Neon does not provide a direct instruction for computing >> strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the "fadda" instruction which >> computes add reduction for floating point in strict order. >> >> For MulReduction : >> Both Neon and SVE do not provide a direct instruction for computing >> strictly ordered floating point multiply reduction. For vector lengths >> of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is >> generated and multiply reduction for vector lengths > 16B is not >> supported. >> >> Below is the performance of the two newly added microbenchmarks in >> Float16OperationsBenchmark.java tested on three different aarch64 >> machines and with varying MaxVectorSize - >> >> Note: On all machines, the score (ops/ms) is compared with the master >> branch without this patch which generates a sequence of loads ("ldrsh") >> to load the FP16 value into an FPR and a scalar "fadd/fmul" to >> add/multiply the loaded value to the running sum/product. The ratios >> given below are the ratios between the throughput with this patch and >> the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the >> master branch. >> >> N1 (UseSVE = 0, max vector length = 16B): >> Benchmark vecto... > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 3490: > >> 3488: %} >> 3489: ins_pipe(pipe_slow); >> 3490: %} > > Could you please float this rule above `reduce_addF_sve` and below `reduce_addHF`? Better to rename `reduce_addHF` to `reduce_addHF_neon` ? Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2640448612 From bkilambi at openjdk.org Mon Dec 22 16:37:56 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 22 Dec 2025 16:37:56 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v3] In-Reply-To: <58NIPjOC6PTzn0H5BwY5FUkNfpe_qHuHLyIPCLiZ1QI=.0bf7f024-e772-459c-bd96-01981446beda@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <58NIPjOC6PTzn0H5BwY5FUkNfpe_qHuHLyIPCLiZ1QI=.0bf7f024-e772-459c-bd96-01981446beda@github.com> Message-ID: On Fri, 19 Dec 2025 08:55:18 GMT, Xiaohong Gong wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix build failures on Mac > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 259: > >> 257: // implements strictly ordered floating point add reduction which does not require >> 258: // the FEAT_FP16 and ASIMDHP checks as SVE supports half-precision floats by default. >> 259: case Op_AddReductionVHF: > > Does it need to check `length_in_bytes < 8` for add reduction? Yes, it would be better to add it. Although when I tested with `MaxVectorSize=4`, it defaults to 8B instead but with the stress flags above, it could possibly fail although it wasn't easy to reproduce. I have still added this check for `AddReductionVHF` instead. > src/hotspot/cpu/aarch64/aarch64_vector.ad line 392: > >> 390: case Op_StoreVectorScatter: >> 391: case Op_AddReductionVF: >> 392: case Op_AddReductionVHF: > > Suggestion: > > case Op_AddReductionVHF: > case Op_AddReductionVF: Done > src/hotspot/share/opto/vectornode.hpp line 323: > >> 321: // is generated through VectorAPI as VectorAPI does not impose any such rules on ordering. >> 322: const bool _requires_strict_order; >> 323: public: > > Suggestion: > > const bool _requires_strict_order; > > public: Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2640473140 PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2640474548 PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2640473528 From xgong at openjdk.org Tue Dec 23 04:53:57 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 23 Dec 2025 04:53:57 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v4] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Mon, 22 Dec 2025 16:28:45 GMT, Bhavana Kilambi wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Thanks for your updating! I will trigger a test with kinds of SVE machines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3685096477 From galder at openjdk.org Tue Dec 23 05:45:54 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 23 Dec 2025 05:45:54 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v6] In-Reply-To: References: Message-ID: <1gb39c5Ai2AKZqk7UCJWyI5mh2nOxTWhomefhWQTeXA=.6286b980-2eab-485d-8c1d-005fd4add5a0@github.com> On Mon, 22 Dec 2025 12:09:01 GMT, Jatin Bhateja wrote: >> Emulate multiplier using LEA addressing scheme, where effective address = BASE + INDEX * SCALE + OFFSET >> Refer to section "3.5.1.2 Using LEA" of Intel's optimization manual for details reagarding slow vs fast lea instructions. >> Given that latency of IMUL with register operands is 3 cycles, a combination of two fast LEAs each with 1 cycle latency to emulate multipler is performant. >> >> Consider X as the multiplicand, by variying the scale of first LEA instruction we can generate 4 input i.e. >> >> >> X + X * 1 = 2X >> X + X * 2 = 3X >> X + X * 4 = 5X >> X + X * 8 = 9X >> >> >> Following table list downs various multiplier combinations for output of first LEA at BASE and/or INDEX by varying the >> scale of second fast LEA instruction. We will only handle the cases which cannot be handled by just shift + add. >> >> >> BASE INDEX SCALE MULTIPLER >> X X 1 2 (Terminal) >> X X 2 3 (Terminal) >> X X 4 5 (Terminal) >> X X 8 9 (Terminal) >> 3X 3X 1 6 >> X 3X 2 7 >> 5X 5X 1 10 >> X 5X 2 11 >> X 3X 4 13 >> 5X 5X 2 15 >> X 2X 8 17 >> 9X 9X 1 18 >> X 9X 2 19 >> X 5X 4 21 >> 5X 5X 4 25 >> 9X 9X 2 27 >> X 9X 4 37 >> X 5X 8 41 >> 9X 9X 4 45 >> X 9X 8 73 >> 9X 9X 8 81 >> >> >> All the non-unity inputs tied to BASE / INDEX are derived out of terminal cases which represent first FAST LEA. Thus, all the multipliers can be computed using just two LEA instructions. >> >> Performance numbers for new micro benchmark included with this patch shows around **5-50% improvments** on latest x86 servers. >> >> >> System: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz Emerald Rapids:- >> Baseline:- >> Benchmark Mode Cnt Score Error Units >> ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 189.690 ops/min >> ConstantMultiplierOptimization.testConstMultiplierL thrpt 2 196.283 ops/min >> >> >> Withopt:- >> Benchmark Mode Cnt Score Error Units >> Constant... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Extending micro and jtreg tests for memory patterns Thanks for the changes! Sorry I missed the reference in the JBS :) ------------- Marked as reviewed by galder (Author). PR Review: https://git.openjdk.org/jdk/pull/28759#pullrequestreview-3606628257 From jbhateja at openjdk.org Tue Dec 23 06:25:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 23 Dec 2025 06:25:53 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v6] In-Reply-To: References: Message-ID: On Mon, 22 Dec 2025 12:09:01 GMT, Jatin Bhateja wrote: >> Emulate multiplier using LEA addressing scheme, where effective address = BASE + INDEX * SCALE + OFFSET >> Refer to section "3.5.1.2 Using LEA" of Intel's optimization manual for details reagarding slow vs fast lea instructions. >> Given that latency of IMUL with register operands is 3 cycles, a combination of two fast LEAs each with 1 cycle latency to emulate multipler is performant. >> >> Consider X as the multiplicand, by variying the scale of first LEA instruction we can generate 4 input i.e. >> >> >> X + X * 1 = 2X >> X + X * 2 = 3X >> X + X * 4 = 5X >> X + X * 8 = 9X >> >> >> Following table list downs various multiplier combinations for output of first LEA at BASE and/or INDEX by varying the >> scale of second fast LEA instruction. We will only handle the cases which cannot be handled by just shift + add. >> >> >> BASE INDEX SCALE MULTIPLER >> X X 1 2 (Terminal) >> X X 2 3 (Terminal) >> X X 4 5 (Terminal) >> X X 8 9 (Terminal) >> 3X 3X 1 6 >> X 3X 2 7 >> 5X 5X 1 10 >> X 5X 2 11 >> X 3X 4 13 >> 5X 5X 2 15 >> X 2X 8 17 >> 9X 9X 1 18 >> X 9X 2 19 >> X 5X 4 21 >> 5X 5X 4 25 >> 9X 9X 2 27 >> X 9X 4 37 >> X 5X 8 41 >> 9X 9X 4 45 >> X 9X 8 73 >> 9X 9X 8 81 >> >> >> All the non-unity inputs tied to BASE / INDEX are derived out of terminal cases which represent first FAST LEA. Thus, all the multipliers can be computed using just two LEA instructions. >> >> Performance numbers for new micro benchmark included with this patch shows around **5-50% improvments** on latest x86 servers. >> >> >> System: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz Emerald Rapids:- >> Baseline:- >> Benchmark Mode Cnt Score Error Units >> ConstantMultiplierOptimization.testConstMultiplierI thrpt 2 189.690 ops/min >> ConstantMultiplierOptimization.testConstMultiplierL thrpt 2 196.283 ops/min >> >> >> Withopt:- >> Benchmark Mode Cnt Score Error Units >> Constant... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Extending micro and jtreg tests for memory patterns Hi @eme64 , @sviswa7, Can you also have a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28759#issuecomment-3685324809 From xgong at openjdk.org Tue Dec 23 06:52:36 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 23 Dec 2025 06:52:36 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently Message-ID: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> The test fails intermittently with the following error: Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. For example, given array elements: [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] Sequential scalar addition produces: 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f However, `reduceLanes()` might compute: (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) ------------- Commit messages: - 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently Changes: https://git.openjdk.org/jdk/pull/28960/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28960&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373722 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28960.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28960/head:pull/28960 PR: https://git.openjdk.org/jdk/pull/28960 From dfenacci at openjdk.org Tue Dec 23 07:33:03 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 23 Dec 2025 07:33:03 GMT Subject: [jdk26] RFR: 8373525: C2: assert(_base == Long) failed: Not a Long In-Reply-To: References: Message-ID: On Mon, 22 Dec 2025 14:24:08 GMT, Christian Hagedorn wrote: >> Hi all, >> >> This pull request contains a backport of commit [a61a1d32](https://github.com/openjdk/jdk/commit/a61a1d32a2bbf227081b9da6d101071ceb73076a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Damon Fenacci on 22 Dec 2025 and was reviewed by Christian Hagedorn and Manuel H?ssig. >> >> Thanks! > > Looks good! Thanks for the review @chhagedorn. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28953#issuecomment-3685519171 From dfenacci at openjdk.org Tue Dec 23 07:37:16 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 23 Dec 2025 07:37:16 GMT Subject: [jdk26] Integrated: 8373525: C2: assert(_base == Long) failed: Not a Long In-Reply-To: References: Message-ID: On Mon, 22 Dec 2025 13:19:04 GMT, Damon Fenacci wrote: > Hi all, > > This pull request contains a backport of commit [a61a1d32](https://github.com/openjdk/jdk/commit/a61a1d32a2bbf227081b9da6d101071ceb73076a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Damon Fenacci on 22 Dec 2025 and was reviewed by Christian Hagedorn and Manuel H?ssig. > > Thanks! This pull request has now been integrated. Changeset: 8e0d736b Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/8e0d736b139421d9fc01397690d8403c15b1b4bb Stats: 117 lines in 2 files changed: 116 ins; 0 del; 1 mod 8373525: C2: assert(_base == Long) failed: Not a Long Reviewed-by: chagedorn Backport-of: a61a1d32a2bbf227081b9da6d101071ceb73076a ------------- PR: https://git.openjdk.org/jdk/pull/28953 From jiefu at openjdk.org Tue Dec 23 08:40:52 2025 From: jiefu at openjdk.org (Jie Fu) Date: Tue, 23 Dec 2025 08:40:52 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 23 Dec 2025 06:45:46 GMT, Xiaohong Gong wrote: > The test fails intermittently with the following error: > > > Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) > at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) > at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) > > > The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. > > For example, given array elements: > > [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] > > > Sequential scalar addition produces: > > 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f > > > However, `reduceLanes()` might compute: > > (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL > > > The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. > > Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. > > This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. > > Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. > > [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) Shall we change the calculation of tolerance? e.g. max_abs = max(abs(arr[0]), abs(arr[1]), ...) tolerance = Math.ulp(max_abs) * vlen ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3685735747 From xgong at openjdk.org Tue Dec 23 09:30:50 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 23 Dec 2025 09:30:50 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 23 Dec 2025 08:37:48 GMT, Jie Fu wrote: > Shall we change the calculation of tolerance? > > e.g. > > ``` > max_abs = max(abs(arr[0]), abs(arr[1]), ...) > tolerance = Math.ulp(max_abs) * vlen > ``` Thanks for looking at this issue. It's really a good question about the definition of a more reasonable `tolerance`, which is fundamentally a numerical analysis problem per my understanding. For a floating?point reduction test, the goal is to check that the API is implemented correctly. And what we really care about is how close the reduction result is to the mathematically expected value. In other words, the tolerance should be derived from the expected sum itself. For example, if the float array is `[1.0f, -1.0f, 2.0f, -2.0f]`, we genuinely expect a result very close to `0.0f`, not something near `2.0f`. Using `max_abs` to set the tolerance risks inflating the admissible error range (since `max_abs` here would be 2.0f), which I'm afraid might make the test much less effective. WDYT? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3685912527 From jiefu at openjdk.org Tue Dec 23 09:52:02 2025 From: jiefu at openjdk.org (Jie Fu) Date: Tue, 23 Dec 2025 09:52:02 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 23 Dec 2025 09:28:00 GMT, Xiaohong Gong wrote: > And what we really care about is how close the reduction result is to the mathematically expected value. Understood. However, (1.0, 5.0) the test range is really too small. If we expand the range, the current tolerance may still become not big enough. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3685979894 From bkilambi at openjdk.org Tue Dec 23 10:05:55 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 23 Dec 2025 10:05:55 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 In-Reply-To: <_QEYCQm138PWv2vGjMFvEJ6kfMjGEn_vsuEZ_EPaRxQ=.b42967e5-cc22-4c98-a454-6698ce0a70cf@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> <_QEYCQm138PWv2vGjMFvEJ6kfMjGEn_vsuEZ_EPaRxQ=.b42967e5-cc22-4c98-a454-6698ce0a70cf@github.com> Message-ID: On Thu, 11 Dec 2025 12:06:49 GMT, Marc Chevalier wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > As for the IR verification failure, I've looked a bit and couldn't find such an issue already. Since it reproduces on master, I suggest you file a ticket, indeed. Thanks! Hello @marc-chevalier Could I request you to test the new patchset again please? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27526#issuecomment-3686020585 From epeter at openjdk.org Tue Dec 23 11:46:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Dec 2025 11:46:08 GMT Subject: [jdk26] RFR: 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph In-Reply-To: References: <_qsak5U1E4TBrhC0V15Lm5TmaDjIXFNwA93zyXxkXNI=.6234364a-a7fe-46af-a3b7-798c4ab45496@github.com> Message-ID: On Mon, 22 Dec 2025 11:19:43 GMT, Christian Hagedorn wrote: >> Clean backport of https://github.com/openjdk/jdk/pull/28783 to JDK26. > > Looks good! @chhagedorn @mhaessig Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28929#issuecomment-3686349692 From epeter at openjdk.org Tue Dec 23 11:46:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Dec 2025 11:46:11 GMT Subject: [jdk26] Integrated: 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph In-Reply-To: <_qsak5U1E4TBrhC0V15Lm5TmaDjIXFNwA93zyXxkXNI=.6234364a-a7fe-46af-a3b7-798c4ab45496@github.com> References: <_qsak5U1E4TBrhC0V15Lm5TmaDjIXFNwA93zyXxkXNI=.6234364a-a7fe-46af-a3b7-798c4ab45496@github.com> Message-ID: <_wKtC3exf78bD2_pQ_hvVVTEI_C04NiK6ZB2ArMpDU8=.0d7177b7-98ce-4b40-ab56-75f8a1778893@github.com> On Fri, 19 Dec 2025 14:55:33 GMT, Emanuel Peter wrote: > Clean backport of https://github.com/openjdk/jdk/pull/28783 to JDK26. This pull request has now been integrated. Changeset: 3db9a5af Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/3db9a5affed36e4a35893f0608e153a762794cdf Stats: 112 lines in 3 files changed: 112 ins; 0 del; 0 mod 8373502: C2 SuperWord: speculative check uses VPointer variable was pinned after speculative check, leading to bad graph Reviewed-by: mhaessig, chagedorn Backport-of: 00050f84d44f3ec23e9c6da52bffd68770010749 ------------- PR: https://git.openjdk.org/jdk/pull/28929 From epeter at openjdk.org Tue Dec 23 11:49:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Dec 2025 11:49:52 GMT Subject: RFR: 8373480: Optimize multiplication by constant multiplier using LEA instructions [v6] In-Reply-To: References: Message-ID: On Tue, 23 Dec 2025 06:23:19 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Extending micro and jtreg tests for memory patterns > > Hi @eme64 , @sviswa7, Can you also have a look. @jatin-bhateja That looks very exciting, I definitively want to have a look at this after the Christmas/New Year break ? I'll be especially excited to see the use of the Template Framework in action here :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28759#issuecomment-3686361935 From jiefu at openjdk.org Tue Dec 23 12:18:53 2025 From: jiefu at openjdk.org (Jie Fu) Date: Tue, 23 Dec 2025 12:18:53 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 23 Dec 2025 09:28:00 GMT, Xiaohong Gong wrote: > For example, if the float array is `[1.0f, -1.0f, 2.0f, -2.0f]`, we genuinely expect a result very close to `0.0f`, not something near `2.0f`. > > Using `max_abs` to set the tolerance risks inflating the admissible error range (since `max_abs` here would be 2.0f), which I'm afraid might make the test much less effective. FYI: the current sum based tolerance may be also bigger than max_abs based. For example, if the float array is `[1.0f, 1.0f, 2.0f, 2.0f]`. The sum would be `6.0f`, the max_abs would be `2.0`. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3686438527 From dskantz at openjdk.org Tue Dec 23 14:27:03 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 23 Dec 2025 14:27:03 GMT Subject: RFR: 8362117: C2: compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java fails with a wrong result due to invalidated liveness assumptions for data phis [v2] In-Reply-To: <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com> References: <11lcsXkMGpKMQr60NCKofzldqpnJka1XZtrGRrUai3o=.c2201234-bbf2-465a-b237-cd9fe8505491@github.com> Message-ID: On Wed, 3 Sep 2025 08:02:04 GMT, Daniel Skantz wrote: >> This PR addresses a wrong compilation during string optimizations. >> >> During stacked string concatenation of two StringBuilder links SB1 and SB2, the pattern "append -> Phi -> Region -> (True, False) -> If -> Bool -> CmpP -> Proj (Result) -> toString" may be observed, where toString is the end of SB1, and the simple diamond is part of SB2. >> >> After JDK-8291775, the Bool test to the diamond If is set to a constant zero to allow for folding the simple diamond away during IGVN, while not letting the top() value from the result projection of SB1 propagate through the graph too quickly. The assumption was that any data Phi of the Region would go away during PhaseRemoveUseless as they are no longer live -- I think that in the case of JDK-8291775, the user of phi was the constructor of SB2. However, in the attached test case, the Phi stays live as it's a parameter (input to an append) of SB2 and will be used during the transformation in `copy_string`. When the diamond region is later folded, the Phi's user picks up the wrong input corresponding to the false branch. >> >> The proposed solution is to disable the stacked concatenation optimization for this specific pattern. This might be pragmatic as it's an edge case and there's already a bug tail: JDK-8271341-> JDK-8291775 -> JDK-8362117. >> >> Testing: T1-3 (aed5952). >> >> Extra testing: ran T1-3 on Linux with an instrumented build and verified that the pattern I am excluding in this PR is not seen during any other compilation than that of the proposed regression test. > > Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision: > > - store intermediate calculations > - direction convention A comment to keep this PR active. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27028#issuecomment-3686834493 From rcastanedalo at openjdk.org Tue Dec 23 14:35:01 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 23 Dec 2025 14:35:01 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v10] In-Reply-To: References: <71h5168GmX3c9kMMlizU_ueAqytUYmU1zwfmMsRCLEY=.96fa6ece-5804-40f2-83ef-5979650a449a@github.com> Message-ID: On Mon, 22 Dec 2025 13:01:13 GMT, Roberto Casta?eda Lozano wrote: >> I have added a section describing some future work based on this PR that I have come up with. > > @merykitty would it be possible to guard the logic added by this patch with a new diagnostic flag, to facilitate reviewing and experimenting? > @robcasloz Done, is it good for you? Thanks! Besides disabling load folding, it would be great if `DoLocalEscapeAnalysis` could also prevent any additional work introduced by this changeset (data structure allocation etc.), so that we can use it to evaluate its impact on compilation speed. I'll continue looking at the patch when I'm back from vacation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3686857617 From mdoerr at openjdk.org Tue Dec 23 16:59:34 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 23 Dec 2025 16:59:34 GMT Subject: RFR: 8374195: TestReplaceNarrowPhiWithBottomPhi fails on ppc64 platforms in (fast)debug Message-ID: This test makes assumptions about the C2 ideal graph which are not true for PPC64. We need to get tests green also in jdk26 where the new test has landed in the meantime, so simply disabling it for the platform. Test improvements can be done later if needed. ------------- Commit messages: - 8374195: TestReplaceNarrowPhiWithBottomPhi fails on ppc64 platforms in (fast)debug Changes: https://git.openjdk.org/jdk/pull/28964/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28964&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374195 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28964.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28964/head:pull/28964 PR: https://git.openjdk.org/jdk/pull/28964 From qamai at openjdk.org Tue Dec 23 17:46:31 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Dec 2025 17:46:31 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v13] In-Reply-To: References: Message-ID: > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > ## The current PR: > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. > > I do not see a noticeable difference in C2 runtime with and without this patch. > > ## Future work: > > 1. Nested object: > > Consider this case: > > Holder h = new Holder(); > Object o = new Object(); > h.o = o; > > Currently, `o` will be considered escaped at `h.o = o`. However, it can be seen that `o` has not actually escaped because `h` has not escaped. Luckily, with the current approach, this can be easily achieved, notice how this loop is just "if anything escapes, consider `base` escapes", currently, the "anything" here includes `base` and its aliases. if we include the base of the object at which `o` is stored, then we can correctly determine if `o` has escaped. > > // Find all nodes that may escape alloc, and decide that it is provable that they must be > // executed after ctl > EscapeStatus res = NOT_ESCAPED; > aliases.push(base); > for (uint idx = 0; idx < aliases.size(); idx++) { > Node* n = aliases.at(idx); > > 2. Fold a memory `Phi`. > > This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. > > 3. Fold a pointer `Phi`. > > This can be easy, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: > > Point p1 = new Point; > Point p2 = new Point; > p1.x = v1; > p2.x = v2; > Point p = Phi(p1, p2); > int a = p.x; > > Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. > > Another i... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: ea of phis and nested objects ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/74064ab8..c546d216 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=11-12 Stats: 451 lines in 3 files changed: 331 ins; 27 del; 93 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From qamai at openjdk.org Tue Dec 23 18:11:53 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Dec 2025 18:11:53 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v13] In-Reply-To: References: Message-ID: On Tue, 23 Dec 2025 17:46:31 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch is an alternative to #28764 but it does the analysis during IGVN instead. >> >> ## The current PR: >> >> The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. >> >> This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. >> >> I do not see a noticeable difference in C2 runtime with and without this patch. >> >> ## Future work: >> >> 1. Nested object: >> >> Consider this case: >> >> Holder h = new Holder(); >> Object o = new Object(); >> h.o = o; >> >> Currently, `o` will be considered escaped at `h.o = o`. However, it can be seen that `o` has not actually escaped because `h` has not escaped. Luckily, with the current approach, this can be easily achieved, notice how this loop is just "if anything escapes, consider `base` escapes", currently, the "anything" here includes `base` and its aliases. if we include the base of the object at which `o` is stored, then we can correctly determine if `o` has escaped. >> >> // Find all nodes that may escape alloc, and decide that it is provable that they must be >> // executed after ctl >> EscapeStatus res = NOT_ESCAPED; >> aliases.push(base); >> for (uint idx = 0; idx < aliases.size(); idx++) { >> Node* n = aliases.at(idx); >> >> 2. Fold a memory `Phi`. >> >> This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. >> >> 3. Fold a pointer `Phi`. >> >> This can be easy, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: >> >> Point p1 = new Point; >> Point p2 = new Point; >> p1.x = v1; >> p2.x = v2; >> Point p = Phi(p1, p2); >> int a = p.x; >> >> Then, `a` sh... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > ea of phis and nested objects I have added analysis of `Phi`s and nested objects to this PR as it also aligns well with my refactor to make `LocalEA` more encapsulated and the additional complexity is minimal. @robcasloz Currently, the extra work is minimal, I can also remove the allocation of extra data structures, but there are only the default constructions of 2 `Unique_Node_List`. I can solve this by either making `Unique_Node_List` default constructor non-allocating, or making the field a `union` and only constructing the `Unique_Node_List` when `DoLocalEscapeAnalysis` like this: union { char _dummy; Unique_Node_List _list; } if (DoLocalEscapeAnalysis) { ::new(&_list) Unique_Node_List(); } What do you think? Is leaving those default constructions fine, or which is the more preferable solution? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28812#issuecomment-3687520710 From bulasevich at openjdk.org Tue Dec 23 18:24:27 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 23 Dec 2025 18:24:27 GMT Subject: RFR: 8374307: Fix deoptimization storm caused by Action_none in GraphKit::uncommon_trap Message-ID: We observed a deoptimization storm caused by GraphKit::uncommon_trap generator logic. GraphKit::uncommon_trap considers the too_many_recompiles metric. If the threshold is overflowed, it replaces Deoptimization::Action_reinterpret with Deoptimization::Action_none (see code snippet below). This replacement changes the uncommon_trap logic: once execution hits a trap, the VM performs deoptimization but does not recompile the method anymore. In an "unlucky" case, when the code part calling this uncommon_trap becomes frequent, a deoptimization storm occurs (thousands of deoptimizations per second) causing a significant performance drop. The original problematic method, which triggered repeated recompilations, is a high-performance compressed binary serialization algorithm with heavy use of conditional branches driven by bitmasks. See a standalone [synthetic benchmark](https://bugs.openjdk.org/secure/attachment/118045/UnstableIf.java) to reproduce the issue. The issue arises when the method overcomes a global recompilation threshold before stabilizing specific trap counters. Current thresholds: - Recompilation Limit (too_many_recompiles): Condition: decompile_count() >= (PerMethodRecompilationCutoff / 2) + 1 Default: 201 (derived from default PerMethodRecompilationCutoff = 400). - Specific Trap Limits (too_many_traps): Checks if the trap count for a specific reason exceeds: PerMethodTrapLimit (Default: 100) - for Reason_unstable_if, Reason_unstable_fused_if, etc. PerMethodSpecTrapLimit (Default: 5000) - for Reason_speculate_class_check, Reason_speculate_null_check, etc. With the gived defaults, if the only reason for the method recompilation is unstable_if, the system stabilizes after 100 traps (PerMethodTrapLimit). However, if the method experiences traps and recompilations for different reasons, the total number of recompilations can exceed 200 before hitting the limit for unstable_if traps. This triggers Action_none and causes the deopt storm. The proposal is a minimal change in GraphKit::uncommon_trap: apply the same `too_many_recompiles` threshold inside `Parse::path_is_suitable_for_uncommon_trap` - this ensures that on the final recompilation C2 gets a hint not to speculate on untaken branches anymore. As an alternative solution, we can revisit GraphKit::uncommon_trap. This "Temporary fix" has persisted in the codebase for 17 years, so it is probably time to change it as well. Any comments are welcome case Deoptimization::Action_reinterpret: // Temporary fix for 6529811 to allow virtual calls to be sure they // get the chance to go from mono->bi->mega if (!keep_exact_action && Deoptimization::trap_request_index(trap_request) < 0 && too_many_recompiles(reason)) { // This BCI is causing too many recompilations. if (C->log() != nullptr) { C->log()->elem("observe that='trap_action_change' reason='%s' from='%s' to='none'", Deoptimization::trap_reason_name(reason), Deoptimization::trap_action_name(action)); } action = Deoptimization::Action_none; trap_request = Deoptimization::make_trap_request(reason, action); } else { C->set_trap_can_recompile(true); } ------------- Commit messages: - 8374307: Fix deoptimization storm caused by Action_none in GraphKit::uncommon_trap Changes: https://git.openjdk.org/jdk/pull/28966/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28966&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374307 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28966/head:pull/28966 PR: https://git.openjdk.org/jdk/pull/28966 From qamai at openjdk.org Tue Dec 23 18:44:58 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Dec 2025 18:44:58 GMT Subject: RFR: 8374307: Fix deoptimization storm caused by Action_none in GraphKit::uncommon_trap In-Reply-To: References: Message-ID: On Tue, 23 Dec 2025 18:16:33 GMT, Boris Ulasevich wrote: > We observed a deoptimization storm caused by GraphKit::uncommon_trap generator logic. GraphKit::uncommon_trap considers the too_many_recompiles metric. If the threshold is overflowed, it replaces Deoptimization::Action_reinterpret with Deoptimization::Action_none (see code snippet below). > > This replacement changes the uncommon_trap logic: once execution hits a trap, the VM performs deoptimization but does not recompile the method anymore. In an "unlucky" case, when the code part calling this uncommon_trap becomes frequent, a deoptimization storm occurs (thousands of deoptimizations per second) causing a significant performance drop. > > The original problematic method, which triggered repeated recompilations, is a high-performance compressed binary serialization algorithm with heavy use of conditional branches driven by bitmasks. See a standalone [synthetic benchmark](https://bugs.openjdk.org/secure/attachment/118045/UnstableIf.java) to reproduce the issue. > > The issue arises when the method overcomes a global recompilation threshold before stabilizing specific trap counters. > > Current thresholds: > - Recompilation Limit (too_many_recompiles): > Condition: decompile_count() >= (PerMethodRecompilationCutoff / 2) + 1 > Default: 201 (derived from default PerMethodRecompilationCutoff = 400). > - Specific Trap Limits (too_many_traps): > Checks if the trap count for a specific reason exceeds: > PerMethodTrapLimit (Default: 100) - for Reason_unstable_if, Reason_unstable_fused_if, etc. > PerMethodSpecTrapLimit (Default: 5000) - for Reason_speculate_class_check, Reason_speculate_null_check, etc. > > With the gived defaults, if the only reason for the method recompilation is unstable_if, the system stabilizes after 100 traps (PerMethodTrapLimit). However, if the method experiences traps and recompilations for different reasons, the total number of recompilations can exceed 200 before hitting the limit for unstable_if traps. This triggers Action_none and causes the deopt storm. > > The proposal is a minimal change in GraphKit::uncommon_trap: apply the same `too_many_recompiles` threshold inside `Parse::path_is_suitable_for_uncommon_trap` - this ensures that on the final recompilation C2 gets a hint not to speculate on untaken branches anymore. > > As an alternative solution, we can revisit GraphKit::uncommon_trap. This "Temporary fix" has persisted in the codebase for 17 years, so it is probably time to change it as well. Any comments are welcome > > case Deoptimization::Action_reinter... src/hotspot/share/opto/parse2.cpp line 1621: > 1619: return seems_never_taken(prob) && > 1620: // Skip optimization if recompile limit is exceeded to avoid deopts without recompilation. > 1621: !C->too_many_recompiles(method(), bci(), Deoptimization::Reason_unstable_if) && You can use `Compile::too_many_traps_or_recompile` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28966#discussion_r2644002566 From xgong at openjdk.org Wed Dec 24 01:49:57 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 24 Dec 2025 01:49:57 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v4] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: On Mon, 22 Dec 2025 16:28:45 GMT, Bhavana Kilambi wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Tests passed. LGTM! ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/27526#pullrequestreview-3609692140 From xgong at openjdk.org Wed Dec 24 01:57:53 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 24 Dec 2025 01:57:53 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 23 Dec 2025 12:16:01 GMT, Jie Fu wrote: > > For example, if the float array is `[1.0f, -1.0f, 2.0f, -2.0f]`, we genuinely expect a result very close to `0.0f`, not something near `2.0f`. > > Using `max_abs` to set the tolerance risks inflating the admissible error range (since `max_abs` here would be 2.0f), which I'm afraid might make the test much less effective. > > FYI: the current sum based tolerance may be also bigger than max_abs based. For example, if the float array is `[1.0f, 1.0f, 2.0f, 2.0f]`. The sum would be `6.0f`, the max_abs would be `2.0`. What do you think? We used the `Math.ulp(sum) * 10` to calculate the tolerance, which means the difference of expected and actual value is inside of 10 ULP at the value of `sum`. Note that `Math.ulp(f)` is the positive distance between `f` and the next representable float value larger in magnitude (1 ulp around value `f`). Hence, it is reasonable based on the reference value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3688389893 From xgong at openjdk.org Wed Dec 24 02:09:01 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 24 Dec 2025 02:09:01 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 23 Dec 2025 09:49:26 GMT, Jie Fu wrote: > > And what we really care about is how close the reduction result is to the mathematically expected value. > > Understood. > > However, (1.0, 5.0) the test range is really too small. If we expand the range, the current tolerance may still become not big enough. What I really want to avoid in this case is generating values that are near the extreme maximum or minimum of float. Expanding the value range would probably still be acceptable with the current tolerance, which is derived **not only** from the cross?lane sum **but also** includes a rounding?error factor of 10. BTW, consider that there are already enough API tests under `test/jdk/jdk/incubator/vector`, this test deliberately uses a narrower range, because its primary goal is to **verify the expected IR** generated on SVE, rather than to stress all numerical edge cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3688409647 From jiefu at openjdk.org Wed Dec 24 03:09:07 2025 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 24 Dec 2025 03:09:07 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Wed, 24 Dec 2025 01:54:02 GMT, Xiaohong Gong wrote: > We used the `Math.ulp(sum) * 10` to calculate the tolerance, which means the difference of expected and actual value is inside of 10 ULP around the value of `sum`. The question here is what do you mean by `expected value`? Do you mean the sequential scalar floating point add with rounding errors will produce the expected value? Or do you mean the golden value in math? Just consider the following case 1000.0f + Float.MAX_VALUE - Float.MAX_VALUE The sequential scalar floating add will result 0.0f. However, the golden value in math should be 1000.0f. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3688511017 From xgong at openjdk.org Wed Dec 24 05:04:08 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 24 Dec 2025 05:04:08 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Wed, 24 Dec 2025 03:06:35 GMT, Jie Fu wrote: > > We used the `Math.ulp(sum) * 10` to calculate the tolerance, which means the difference of expected and actual value is inside of 10 ULP around the value of `sum`. > > The question here is what do you mean by `expected value`? Do you mean the sequential scalar floating point add with rounding errors will produce the expected value? Or do you mean the golden value in math? > > Just consider the following case > > ``` > 1000.0f + Float.MAX_VALUE - Float.MAX_VALUE > ``` > > The sequential scalar floating add will result 0.0f. However, the golden value in math should be 1000.0f. I think we should refer to the java ref spec, that calculates the values in sequential order, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3688687630 From jiefu at openjdk.org Wed Dec 24 06:17:01 2025 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 24 Dec 2025 06:17:01 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Wed, 24 Dec 2025 05:01:11 GMT, Xiaohong Gong wrote: > I think we should refer to the java ref spec, that calculates the values in sequential order, right? I'm not sure. Did you find related description in the spec? If that is true, the current sum-based `Math.ulp(0.0f) * 10` is far from `1000.0f`, which seems unreasonable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3688798787 From erfang at openjdk.org Wed Dec 24 07:15:58 2025 From: erfang at openjdk.org (Eric Fang) Date: Wed, 24 Dec 2025 07:15:58 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v5] In-Reply-To: References: Message-ID: <113AdKQ15cNGhCreLhNcnBjkOMh8riqUR8TnUCDKBPM=.7ec93b7a-6433-461c-8b61-93d360f7d712@github.com> On Fri, 12 Dec 2025 15:13:35 GMT, Emanuel Peter wrote: >> Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java >> - Refine the test code and comments >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - Don't read and write the same memory in the JMH benchmarks >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns >> >> `VectorMaskCastNode` is used to cast a vector mask from one type to >> another type. The cast may be generated by calling the vector API `cast` >> or generated by the compiler. For example, some vector mask operations >> like `trueCount` require the input mask to be integer types, so for >> floating point type masks, the compiler will cast the mask to the >> corresponding integer type mask automatically before doing the mask >> operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` >> don't generate code, otherwise code will be generated to extend or narrow >> the mask. This IR node is not free no matter it generates code or not >> because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` >> The middle `VectorMaskCast` prevented the following optimization: >> `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which >> blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we >> can safely do the optimization. But if the input value is changed, we >> can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper >> function, which can be used to uncast a chain of `VectorMaskCastNode`, >> like the existing `Node::uncast(bool)` function. The funtion returns >> the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may >> contain one or more consecutive `VectorMaskCastNode` and this does not >> affect the correctness of the optimization. Then this function can be >> called to eliminate the `VectorMaskCastNode` ch... > > src/hotspot/share/opto/vectornode.cpp line 1492: > >> 1490: // vector[n]{bool} => vector[n]{t} => vector[n]{bool} >> 1491: Node* in1 = VectorNode::uncast_mask(in(1)); >> 1492: if (in1->Opcode() == Op_VectorLoadMask && length() == in1->as_Vector()->length()) { > > Can there be a mismatch with the length? Can you give me an example? Hi @eme64 , I?d really appreciate hearing your thoughts on this when you have a moment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2644986216 From jbhateja at openjdk.org Wed Dec 24 08:14:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 24 Dec 2025 08:14:53 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: <4QGWqFTXLV_V8NaIe8QvEfexn_di7GX8sVyFtQjDCx8=.79de235e-8c88-4b38-b893-bf704451a66a@github.com> On Tue, 23 Dec 2025 06:45:46 GMT, Xiaohong Gong wrote: > The test fails intermittently with the following error: > > > Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) > at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) > at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) > > > The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. > > For example, given array elements: > > [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] > > > Sequential scalar addition produces: > > 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f > > > However, `reduceLanes()` might compute: > > (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL > > > The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. > > Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. > > This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. > > Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. > > [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) test/hotspot/jtreg/compiler/vectorapi/TestVectorOperationsWithPartialSize.java line 80: > 78: random.fill(random.longs(), la); > 79: random.fill(random.uniformFloats(1.0f, 5.0f), fa); > 80: random.fill(random.uniformDoubles(1.0, 5.0), da); Ideally our tolerance window should be narrow, and increasing the tolerance range to accomodate outliers as you mentioned in your issue description may defeat the purpose. Unlike auto-vectorization which adhears strict ordering JLS semantics, vectorAPI relaxes the reduction order to give backends leeway to use parallel reduction not strictly following the sequential order. There are multiple considerations involed, fallback implimentation performs reduction sequentially, inline expander always relaxes the strict ordering, intrinsification of Add/Mul reductions are only supported by Aarch64, X86 and riscv. Computing expected value using parallel reduction can be other alternative but then we may face similar problems on targets which does not intrinsify unordered reductions. Tolerance modeling is a complex topic and involves relative and absolute error, current 10ULP absolute limit is not generic enough to handle entier spectrum of values, what you have enforced now is a range based tolerance did you try widening the input value range and confirm if 10ULP tolerance limit is sufficient ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28960#discussion_r2645002228 From jbhateja at openjdk.org Wed Dec 24 09:17:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 24 Dec 2025 09:17:57 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v4] In-Reply-To: References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: <4o-mRvCoV4nHqDouamLFsjYVVHhSuAOurJipQmy3xo8=.08cbadfa-ebdf-4c6b-a7f3-efe808f82b92@github.com> On Mon, 22 Dec 2025 16:28:45 GMT, Bhavana Kilambi wrote: >> This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. >> >> Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - >> >> **For AddReduction :** >> On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. >> >> On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. >> >> **For MulReduction :** >> Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. >> >> Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - >> >> Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. >> Ratio > 1 indicates the performance with this patch is better than the master branch. >> >> **N1 (UseSVE = 0, max vector length = 16B):** >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionAddFP16 256 thrpt 9 1.41 1.40 >> ReductionAddFP16 512 thrpt 9 1.41 1.41 >> ReductionAddFP16 1024 thrpt 9 1.43 1.40 >> ReductionAddFP16 2048 thrpt 9 1.43 1.40 >> ReductionMulFP16 256 thrpt 9 1.22 1.22 >> ReductionMulFP16 512 thrpt 9 1.21 1.23 >> ReductionMulFP16 1024 thrpt 9 1.21 1.22 >> ReductionMulFP16 2048 thrpt 9 1.20 1.22 >> >> >> On N1, the scalarized sequence of `fadd/fmul` are gener... > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Common IR changes looks good to me, adding some minor comments. test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 81: > 79: if(!expected_fp16.equals(actual_fp16)) { > 80: throw new AssertionError("Result Mismatch!, reduction type = " + reductionFunc + " actual = " + actual_fp16 + " expected = " + expected_fp16); > 81: } Please use Verify.checkEQ, Float16 support has been added to [Verifier](https://github.com/openjdk/jdk/pull/28095) recently test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java line 323: > 321: short result = (short) 0; > 322: for (int i = 0; i < vectorDim; i++) { > 323: result = float16ToRawShortBits(Float16.add(Float16.shortBitsToFloat16(result), Float16.shortBitsToFloat16(vector1[i]))); Float16 class is statically imported, we don't need fully qualified names here. Suggestion: result = float16ToRawShortBits(Float16.add(shortBitsToFloat16(result), shortBitsToFloat16(vector1[i]))); test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java line 332: > 330: short result = floatToFloat16(1.0f); > 331: for (int i = 0; i < vectorDim; i++) { > 332: result = float16ToRawShortBits(Float16.multiply(Float16.shortBitsToFloat16(result), Float16.shortBitsToFloat16(vector1[i]))); Suggestion: result = float16ToRawShortBits(Float16.multiply(shortBitsToFloat16(result), shortBitsToFloat16(vector1[i]))); ------------- PR Review: https://git.openjdk.org/jdk/pull/27526#pullrequestreview-3610393442 PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2645201339 PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2645214816 PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2645216202 From xgong at openjdk.org Wed Dec 24 09:23:59 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 24 Dec 2025 09:23:59 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: <4QGWqFTXLV_V8NaIe8QvEfexn_di7GX8sVyFtQjDCx8=.79de235e-8c88-4b38-b893-bf704451a66a@github.com> References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> <4QGWqFTXLV_V8NaIe8QvEfexn_di7GX8sVyFtQjDCx8=.79de235e-8c88-4b38-b893-bf704451a66a@github.com> Message-ID: <_PbD2tRBjM1l1QO3px71zhIP9w6mJOkHgCt5gHnjm8E=.07e928e4-3f55-4656-8fd5-4cace3149b85@github.com> On Wed, 24 Dec 2025 07:22:50 GMT, Jatin Bhateja wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > test/hotspot/jtreg/compiler/vectorapi/TestVectorOperationsWithPartialSize.java line 80: > >> 78: random.fill(random.longs(), la); >> 79: random.fill(random.uniformFloats(1.0f, 5.0f), fa); >> 80: random.fill(random.uniformDoubles(1.0, 5.0), da); > > Ideally our tolerance window should be narrow, and increasing the tolerance range to accomodate outliers as you mentioned in your issue description may defeat the purpose. > > Unlike auto-vectorization which adhears strict ordering JLS semantics, vectorAPI relaxes the reduction order to give backends leeway to use parallel reduction not strictly following the sequential order. > > There are multiple considerations involed, fallback implimentation performs reduction sequentially, inline expander always relaxes the strict ordering, intrinsification of Add/Mul reductions are only supported by Aarch64, X86 and riscv. > > Computing expected value using parallel reduction can be other alternative but then we may face similar problems on targets which does not intrinsify unordered reductions. > > Tolerance modeling is a complex topic and involves relative and absolute error, current 10ULP absolute limit is not generic enough to handle entier spectrum of values, what you have enforced now is a range based tolerance did you try widening the input value range and confirm if 10ULP tolerance limit is sufficient ? Yeah, I'm trying to extend the value range to `1~3000`. The tests are still running... Since the result largely depends on the random values, I run this test `500` times on SVE/NEON/X86 machines respectively (**1500** times totally), and have not observed failure now. Is that fine to you? I will update the test once all tests pass. Thanks for looking at this change! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28960#discussion_r2645230189 From xgong at openjdk.org Wed Dec 24 09:32:02 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 24 Dec 2025 09:32:02 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Wed, 24 Dec 2025 06:13:53 GMT, Jie Fu wrote: > > I think we should refer to the java ref spec, that calculates the values in sequential order, right? > > I'm not sure. Did you find related description in the spec? > > If that is true, the current sum-based `Math.ulp(0.0f) * 10` is far from `1000.0f`, which seems unreasonable. I used the Java Playground, and see the result: image So that's just the issue that I want to avoid. We didn't have a more reasonable golden value as we do not know the calculation order in Vector API, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3689239304 From jiefu at openjdk.org Wed Dec 24 09:49:50 2025 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 24 Dec 2025 09:49:50 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Wed, 24 Dec 2025 09:29:17 GMT, Xiaohong Gong wrote: > So that's just the issue that I want to avoid. We didn't have a more reasonable golden value as we do not know the calculation order in Vector API, right? Agreed. I would suggest the testing range also covers negative floats, not only positives. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3689274977 From jiefu at openjdk.org Wed Dec 24 14:07:58 2025 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 24 Dec 2025 14:07:58 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 23 Dec 2025 06:45:46 GMT, Xiaohong Gong wrote: > The test fails intermittently with the following error: > > > Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) > at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) > at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) > > > The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. > > For example, given array elements: > > [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] > > > Sequential scalar addition produces: > > 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f > > > However, `reduceLanes()` might compute: > > (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL > > > The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. > > Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. > > This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. > > Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. > > [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) Here is an example which shows that the tolerance may be still not big enough even with `ROUNDING_ERROR_FACTOR_ADD = 10000000.0f`. Note: the test range of `a` and `b` is only `[0.0f, 1.0f]`. class T { public static void main(String[] args) { float ROUNDING_ERROR_FACTOR_ADD = 10000000.0f; Float a = 0.0f + (ROUNDING_ERROR_FACTOR_ADD + 1) * Math.ulp(0.0f); Float b = 1.0f; Float expected = a + b - b; Float actual = a + (b - b); float tolerance = Math.ulp(expected) * ROUNDING_ERROR_FACTOR_ADD; if (Math.abs(expected - actual) > tolerance) { System.out.println("Error: Out of tolerance!"); } } } So I'm afraid the sum-based tolerance should be improved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3689857047 From jiefu at openjdk.org Wed Dec 24 14:40:57 2025 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 24 Dec 2025 14:40:57 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: <_PbD2tRBjM1l1QO3px71zhIP9w6mJOkHgCt5gHnjm8E=.07e928e4-3f55-4656-8fd5-4cace3149b85@github.com> References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> <4QGWqFTXLV_V8NaIe8QvEfexn_di7GX8sVyFtQjDCx8=.79de235e-8c88-4b38-b893-bf704451a66a@github.com> <_PbD2tRBjM1l1QO3px71zhIP9w6mJOkHgCt5gHnjm8E=.07e928e4-3f55-4656-8fd5-4cace3149b85@github.com> Message-ID: On Wed, 24 Dec 2025 09:19:55 GMT, Xiaohong Gong wrote: > Yeah, I'm trying to extend the value range to `1~3000`. The tests are still running... Since the result largely depends on the random values, I run this test `500` times on SVE/NEON/X86 machines respectively (**1500** times totally), and have not observed failure now. Is that fine to you? I will update the test once all tests pass. Thanks for looking at this change! As with range `1~3000`, we may still see failures even with 1000ULP according to the following program, right? class T { public static void main(String[] args) { float ROUNDING_ERROR_FACTOR_ADD = 1000.0f; Float a = 1.0f + (ROUNDING_ERROR_FACTOR_ADD + 1) * Math.ulp(1.0f); Float b = 3000.0f; Float expected = a + b - b; Float actual = a + (b - b); float tolerance = Math.ulp(expected) * ROUNDING_ERROR_FACTOR_ADD; if (Math.abs(expected - actual) > tolerance) { System.out.println("Error: Out of tolerance!"); } } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28960#discussion_r2645836521 From jiefu at openjdk.org Wed Dec 24 14:56:50 2025 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 24 Dec 2025 14:56:50 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> <4QGWqFTXLV_V8NaIe8QvEfexn_di7GX8sVyFtQjDCx8=.79de235e-8c88-4b38-b893-bf704451a66a@github.com> <_PbD2tRBjM1l1QO3px71zhIP9w6mJOkHgCt5gHnjm8E=.07e928e4-3f55-4656-8fd5-4cace3149b85@github.com> Message-ID: On Wed, 24 Dec 2025 14:38:37 GMT, Jie Fu wrote: >> Yeah, I'm trying to extend the value range to `1~3000`. The tests are still running... Since the result largely depends on the random values, I run this test `500` times on SVE/NEON/X86 machines respectively (**1500** times totally), and have not observed failure now. Is that fine to you? I will update the test once all tests pass. Thanks for looking at this change! > >> Yeah, I'm trying to extend the value range to `1~3000`. The tests are still running... Since the result largely depends on the random values, I run this test `500` times on SVE/NEON/X86 machines respectively (**1500** times totally), and have not observed failure now. Is that fine to you? I will update the test once all tests pass. Thanks for looking at this change! > > As with range `1~3000`, we may still see failures even with 1000ULP according to the following program, right? > > > class T { > public static void main(String[] args) { > > float ROUNDING_ERROR_FACTOR_ADD = 1000.0f; > > Float a = 1.0f + (ROUNDING_ERROR_FACTOR_ADD + 1) * Math.ulp(1.0f); > Float b = 3000.0f; > > Float expected = a + b - b; > Float actual = a + (b - b); > > float tolerance = Math.ulp(expected) * ROUNDING_ERROR_FACTOR_ADD; > if (Math.abs(expected - actual) > tolerance) { > System.out.println("Error: Out of tolerance!"); > } > } > } > > Yeah, I'm trying to extend the value range to `1~3000`. The tests are still running... Since the result largely depends on the random values, I run this test `500` times on SVE/NEON/X86 machines respectively (**1500** times totally), and have not observed failure now. Is that fine to you? I will update the test once all tests pass. Thanks for looking at this change! > > As with range `1~3000`, we may still see failures even with 1000ULP according to the following program, right? > > ```java > class T { > public static void main(String[] args) { > > float ROUNDING_ERROR_FACTOR_ADD = 1000.0f; > > Float a = 1.0f + (ROUNDING_ERROR_FACTOR_ADD + 1) * Math.ulp(1.0f); > Float b = 3000.0f; > > Float expected = a + b - b; > Float actual = a + (b - b); > > float tolerance = Math.ulp(expected) * ROUNDING_ERROR_FACTOR_ADD; > if (Math.abs(expected - actual) > tolerance) { > System.out.println("Error: Out of tolerance!"); > } > } > } > ``` Oops, if the range is `1~3000`, there is no negative float, so the above program should not happen. Just ignore it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28960#discussion_r2645879485 From jiefu at openjdk.org Wed Dec 24 15:04:50 2025 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 24 Dec 2025 15:04:50 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: <6pFQnZ8P3l0IFFsIfHY6W65FNfFElGTvlsF1gzOcN24=.87925608-59d9-46fe-bb2d-106a8b04e842@github.com> On Wed, 24 Dec 2025 14:05:52 GMT, Jie Fu wrote: > Note: the test range of `a` and `b` is only `[0.0f, 1.0f]`. The range is `[-1.0f, 1.0f]` actually. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3690038602 From xgong at openjdk.org Thu Dec 25 01:20:57 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 25 Dec 2025 01:20:57 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> <4QGWqFTXLV_V8NaIe8QvEfexn_di7GX8sVyFtQjDCx8=.79de235e-8c88-4b38-b893-bf704451a66a@github.com> <_PbD2tRBjM1l1QO3px71zhIP9w6mJOkHgCt5gHnjm8E=.07e928e4-3f55-4656-8fd5-4cace3149b85@github.com> Message-ID: <37BaoBn8v2bBU5eVVUjaftLWMob9n8fz5btV2y4qYHo=.780e72d6-cc42-4f34-b172-ffca4e3b6dab@github.com> On Wed, 24 Dec 2025 14:54:33 GMT, Jie Fu wrote: >>> Yeah, I'm trying to extend the value range to `1~3000`. The tests are still running... Since the result largely depends on the random values, I run this test `500` times on SVE/NEON/X86 machines respectively (**1500** times totally), and have not observed failure now. Is that fine to you? I will update the test once all tests pass. Thanks for looking at this change! >> >> As with range `1~3000`, we may still see failures even with 1000ULP according to the following program, right? >> >> >> class T { >> public static void main(String[] args) { >> >> float ROUNDING_ERROR_FACTOR_ADD = 1000.0f; >> >> Float a = 1.0f + (ROUNDING_ERROR_FACTOR_ADD + 1) * Math.ulp(1.0f); >> Float b = 3000.0f; >> >> Float expected = a + b - b; >> Float actual = a + (b - b); >> >> float tolerance = Math.ulp(expected) * ROUNDING_ERROR_FACTOR_ADD; >> if (Math.abs(expected - actual) > tolerance) { >> System.out.println("Error: Out of tolerance!"); >> } >> } >> } > >> > Yeah, I'm trying to extend the value range to `1~3000`. The tests are still running... Since the result largely depends on the random values, I run this test `500` times on SVE/NEON/X86 machines respectively (**1500** times totally), and have not observed failure now. Is that fine to you? I will update the test once all tests pass. Thanks for looking at this change! >> >> As with range `1~3000`, we may still see failures even with 1000ULP according to the following program, right? >> >> ```java >> class T { >> public static void main(String[] args) { >> >> float ROUNDING_ERROR_FACTOR_ADD = 1000.0f; >> >> Float a = 1.0f + (ROUNDING_ERROR_FACTOR_ADD + 1) * Math.ulp(1.0f); >> Float b = 3000.0f; >> >> Float expected = a + b - b; >> Float actual = a + (b - b); >> >> float tolerance = Math.ulp(expected) * ROUNDING_ERROR_FACTOR_ADD; >> if (Math.abs(expected - actual) > tolerance) { >> System.out.println("Error: Out of tolerance!"); >> } >> } >> } >> ``` > > Oops, if the range is `1~3000`, there is no negative float, so the above program should not happen. > Just ignore it. Yes, that's why I didn't use negative values in the tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28960#discussion_r2646402566 From xgong at openjdk.org Thu Dec 25 01:37:59 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 25 Dec 2025 01:37:59 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Wed, 24 Dec 2025 09:46:29 GMT, Jie Fu wrote: > > So that's just the issue that I want to avoid. We didn't have a more reasonable golden value as we do not know the calculation order in Vector API, right? > > Agreed. > > I would suggest the testing range also covers negative floats, not only positives. Extending the range to include both negative and positive floats would produce discrepancies like the ones you observed, and would force us to adopt a much larger tolerance, which is undesirable. As mentioned above, it is not realistic to use a single tolerance that works well for all possible input values in the Vector API. Since reduceLanes is already thoroughly tested under jdk/jdk/incubator/vector with a wide variety of float inputs, this particular test is intended to focus on verifying the generated IR rather than exhaustively validating numerical behavior. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3690699242 From jiefu at openjdk.org Thu Dec 25 02:05:52 2025 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 25 Dec 2025 02:05:52 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Thu, 25 Dec 2025 01:35:16 GMT, Xiaohong Gong wrote: > > > So that's just the issue that I want to avoid. We didn't have a more reasonable golden value as we do not know the calculation order in Vector API, right? > > > > > > Agreed. > > I would suggest the testing range also covers negative floats, not only positives. > > Extending the range to include both negative and positive floats would produce discrepancies like the ones you observed, and would force us to adopt a much larger tolerance, which is undesirable. As mentioned above, it is not realistic to use a single tolerance that works well for all possible input values in the Vector API. Since reduceLanes is already thoroughly tested under jdk/jdk/incubator/vector with a wide variety of float inputs, this particular test is intended to focus on verifying the generated IR rather than exhaustively validating numerical behavior. What I want to show here is that the current sum-based tolerance seems unreasonable. This is because the rounding errors are related to all the inputs, not the final result. Also, if the test range are all positive floats, `Math.ulp(sum)` is always larger than `Math.ulp(max-abs-of-inputs)`. So even for positive floats, I would suggest a max-abs-based tolerance like this float baseUlp = Math.ulp(maxAbs_of_the_inputs); // which is smaller than Math.ulp(sum) float tolerance = baseUlp * ROUNDING_ERROR_FACTOR_ADD; tolerance = Math.max(tolerance, Float.MIN_NORMAL); ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3690733113 From xgong at openjdk.org Thu Dec 25 07:06:39 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 25 Dec 2025 07:06:39 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v2] In-Reply-To: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: > The test fails intermittently with the following error: > > > Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) > at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) > at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) > > > The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. > > For example, given array elements: > > [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] > > > Sequential scalar addition produces: > > 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f > > > However, `reduceLanes()` might compute: > > (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL > > > The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. > > Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. > > This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. > > Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. > > [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Extend the float/double value range ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28960/files - new: https://git.openjdk.org/jdk/pull/28960/files/a44f551c..433efddc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28960&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28960&range=00-01 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28960.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28960/head:pull/28960 PR: https://git.openjdk.org/jdk/pull/28960 From xgong at openjdk.org Thu Dec 25 07:23:02 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 25 Dec 2025 07:23:02 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Thu, 25 Dec 2025 02:03:36 GMT, Jie Fu wrote: > > > > So that's just the issue that I want to avoid. We didn't have a more reasonable golden value as we do not know the calculation order in Vector API, right? > > > > > > > > > Agreed. > > > I would suggest the testing range also covers negative floats, not only positives. > > > > > > Extending the range to include both negative and positive floats would produce discrepancies like the ones you observed, and would force us to adopt a much larger tolerance, which is undesirable. As mentioned above, it is not realistic to use a single tolerance that works well for all possible input values in the Vector API. Since reduceLanes is already thoroughly tested under jdk/jdk/incubator/vector with a wide variety of float inputs, this particular test is intended to focus on verifying the generated IR rather than exhaustively validating numerical behavior. > > What I want to show here is that the current sum-based tolerance seems unreasonable. This is because the rounding errors are related to all the inputs, not the final result. > > Also, if the test range are all positive floats, `Math.ulp(sum)` is always larger than `Math.ulp(max-abs-of-inputs)`. So even for positive floats, I would suggest a max-abs-based tolerance like this > > ``` > float baseUlp = Math.ulp(maxAbs_of_the_inputs); // which is smaller than Math.ulp(sum) > float tolerance = baseUlp * ROUNDING_ERROR_FACTOR_ADD; > tolerance = Math.max(tolerance, Float.MIN_NORMAL); > ``` Thanks for the suggestion. I understand your point, and it sounds reasonable to me. In this test, the tolerance calculation is based on the existing Vector API jtreg reduction tests. For example, here is the approach used in `Float128VectorTests`: https://github.com/openjdk/jdk/blob/73a8629c5b52b678febcc9d339e01ebcc5277909/test/jdk/jdk/incubator/vector/Float128VectorTests.java#L143-L156 Where `rc` and `r[i]` are the expected values calculated by scalar addition result and `relativeErrorFactor` for the add reduction is `10.0`. Since the tests under `jdk/incubator/vector` have been exercised for a long time, I think they provide a high level of confidence in the way results are verified. Given that choosing an appropriate tolerance for floating?point computations is a non?trivial problem, I would rather follow the established approach used in those tests than introduce additional complexity in this fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3691061476 From jiefu at openjdk.org Thu Dec 25 08:12:53 2025 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 25 Dec 2025 08:12:53 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Thu, 25 Dec 2025 07:19:47 GMT, Xiaohong Gong wrote: > > Thanks for the suggestion. I understand your point, and it sounds reasonable to me. In this test, the tolerance calculation is based on the existing Vector API jtreg reduction tests. For example, here is the approach used in `Float128VectorTests`: > > https://github.com/openjdk/jdk/blob/73a8629c5b52b678febcc9d339e01ebcc5277909/test/jdk/jdk/incubator/vector/Float128VectorTests.java#L143-L156 May I ask would the example you mentioned above fail for negative floats? Note: I've provided a reproducer which would 100% fail for your current implementation with negative floats, which seems unacceptable to me. Also I'm still not sure if there are corner cases with the 1~3000 range since the logic has been proved wrong with negative floats. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3691114227 From erfang at openjdk.org Thu Dec 25 09:17:54 2025 From: erfang at openjdk.org (Eric Fang) Date: Thu, 25 Dec 2025 09:17:54 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v2] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: <3YcSTCwnJ2lCPCYd38GDd7miAqrRal3iAaaO502VZT4=.737f9a7b-1332-4892-beeb-6e3d49f77c3e@github.com> On Thu, 25 Dec 2025 07:06:39 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Extend the float/double value range Considering the primary purpose of this test is to check if the generated IR meets expectations, I personally think setting specific input value range ??is reasonable. `ULP` typically increases with `|x|`, and given that this is an `add` operation, using `sum` to calculate `ULP` also seems reasonable to me. The tolerance calculation could perhaps be handled separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3691201790 From jiefu at openjdk.org Thu Dec 25 09:41:52 2025 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 25 Dec 2025 09:41:52 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v2] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Thu, 25 Dec 2025 07:06:39 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Extend the float/double value range The testing logic has been proved wrong with negative floats. This is because the rounding errors are mainly related to the inputs, not the final result. Not sure whether the test would always pass without fixing that error logic. ------------- Changes requested by jiefu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28960#pullrequestreview-3612075682 From erfang at openjdk.org Thu Dec 25 10:40:04 2025 From: erfang at openjdk.org (Eric Fang) Date: Thu, 25 Dec 2025 10:40:04 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v2] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Thu, 25 Dec 2025 09:39:08 GMT, Jie Fu wrote: > The testing logic has been proved wrong with negative floats. In my view, the implementations of the `add reduce` operation, as well as the test logic, are sound. Because the vector API does not require a strict order of computation to this API. The main discussion point seems to be how we define a good tolerance computation model for all cases, which is not the primary focus of this particular test ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3691293591 From erfang at openjdk.org Thu Dec 25 10:49:52 2025 From: erfang at openjdk.org (Eric Fang) Date: Thu, 25 Dec 2025 10:49:52 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v2] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Thu, 25 Dec 2025 07:06:39 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Extend the float/double value range Of course, defining a more reasonable tolerance calculation method might be a better fix, and we should also add more corner cases to`JDK/JDK/Incubator/Vector`. But this doesn't seem like an easy task; you need to prove that the new tolerance calculation method is reasonable for all cases. So I think this fix for this specific test is acceptable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3691301859 From jiefu at openjdk.org Thu Dec 25 12:36:00 2025 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 25 Dec 2025 12:36:00 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v2] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Thu, 25 Dec 2025 10:46:33 GMT, Eric Fang wrote: > But this doesn't seem like an easy task; you need to prove that the new tolerance calculation method is reasonable for all cases. Sorry, I can't prove for all cases. But I can prove the current testing logic is wrong. Why not fix it? By the way, even with the limited 1~3000 range, can you prove it's always fine without the fix? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3691399144 From jiefu at openjdk.org Thu Dec 25 13:44:02 2025 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 25 Dec 2025 13:44:02 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v2] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Thu, 25 Dec 2025 07:06:39 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Extend the float/double value range Just clarify: I don't mean to fix the test for all cases, nor do I require the prove of the correctness. At least, we should fix the obvious bug we've found. The test can be proved to be wrong with the range `-3000 ~ 3000`. How do you know it wouldn't fail in the range `1 ~ 3000` without fixing the obvious wrong? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3691454053 From erfang at openjdk.org Thu Dec 25 14:37:50 2025 From: erfang at openjdk.org (Eric Fang) Date: Thu, 25 Dec 2025 14:37:50 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v2] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Thu, 25 Dec 2025 07:06:39 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Extend the float/double value range > Here is an example which shows that the tolerance may be still not big enough even with `ROUNDING_ERROR_FACTOR_ADD = 10000000.0f`. > > Note: the test range of `a` and `b` is only `[0.0f, 1.0f]`. > > ```java > class T { > public static void main(String[] args) { > > float ROUNDING_ERROR_FACTOR_ADD = 10000000.0f; > > Float a = 0.0f + (ROUNDING_ERROR_FACTOR_ADD + 1) * Math.ulp(0.0f); > Float b = 1.0f; > > Float expected = a + b - b; > Float actual = a + (b - b); > > float tolerance = Math.ulp(expected) * ROUNDING_ERROR_FACTOR_ADD; > if (Math.abs(expected - actual) > tolerance) { > System.out.println("Error: Out of tolerance!"); > } > } > } > ``` > > So I'm afraid the sum-based tolerance should be improved. Vector reduction doesn?t provide a `sub` operation, so your example may not be the most appropriate one here. For subtraction, even a very narrow range of input values can lead to large differences. Do you perhaps have another example that might better illustrate the potential issues with the current test logic? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3691496516 From mbaesken at openjdk.org Thu Dec 25 15:28:00 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 25 Dec 2025 15:28:00 GMT Subject: RFR: 8374195: TestReplaceNarrowPhiWithBottomPhi fails on ppc64 platforms in (fast)debug In-Reply-To: References: Message-ID: <0qx5bTdrwb-0yu0Qk2mg3eBtDsd2d0_WW950qjsY9Vg=.ae430320-8e8b-4d70-9f3a-80836e3919df@github.com> On Tue, 23 Dec 2025 16:53:30 GMT, Martin Doerr wrote: > This test makes assumptions about the C2 ideal graph which are not true for PPC64. We need to get tests green also in jdk26 where the new test has landed in the meantime, so simply disabling it for the platform. > Test improvements can be done later if needed. Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28964#pullrequestreview-3612433743 From qamai at openjdk.org Thu Dec 25 16:13:52 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 25 Dec 2025 16:13:52 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v2] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Thu, 25 Dec 2025 07:06:39 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Extend the float/double value range Personally, I would prefer either a provably correct sum verification, or no verification at all since it was mentioned that this test's main purpose is to check the existence of IR nodes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3691571675 From jiefu at openjdk.org Thu Dec 25 23:25:51 2025 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 25 Dec 2025 23:25:51 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v2] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Thu, 25 Dec 2025 14:34:27 GMT, Eric Fang wrote: > Vector reduction doesn?t provide a `sub` operation, so your example may not be the most appropriate one here. For subtraction, even a very narrow range of input values can lead to large differences. Do you perhaps have another example that might better illustrate the potential issues with the current test logic? `a + b - b` can be regarded as `a + b + (-b)`. So that's why I say it would fail with negative floats. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3691814322 From jiefu at openjdk.org Thu Dec 25 23:29:57 2025 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 25 Dec 2025 23:29:57 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v2] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Thu, 25 Dec 2025 16:10:39 GMT, Quan Anh Mai wrote: > no verification at all since it was mentioned that this test's main purpose is to check the existence of IR nodes. I'm fine with this suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3691815769 From serb at openjdk.org Fri Dec 26 00:52:31 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Fri, 26 Dec 2025 00:52:31 GMT Subject: RFR: 8374363: Update copyright year to 2025 for test/micro in files where it was missed Message-ID: The copyright year in "test/micro" files updated in 2025 has been bumped to 2025. **Note:** I have skipped all files updated by the https://github.com/openjdk/jdk/commit/beb43e2633900bb9ab3c975376fe5860b6d054e0 The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: `git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done` ------------- Commit messages: - 8374363: Update copyright year to 2025 for test/micro in files where it was missed Changes: https://git.openjdk.org/jdk/pull/28995/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28995&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374363 Stats: 15 lines in 15 files changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/28995.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28995/head:pull/28995 PR: https://git.openjdk.org/jdk/pull/28995 From wenanjian at openjdk.org Fri Dec 26 02:42:30 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Fri, 26 Dec 2025 02:42:30 GMT Subject: RFR: 8374351: RISC-V: Small refactoring for crypto macro-assembler routines Message-ID: This patch is mainly for readability and subsequent GCM call requirements. 1. Extract the ghash function to facilitate subsequent calls during the implementation of aes-gcm 2. Unify the prefixes of function names for aes intrinsic-related functions. Only use generate prefix for the main intrinsic function, delete the other functions `generate_` prefix ------------- Commit messages: - RISC-V: extraction and reconstruction of the ghash loop function Changes: https://git.openjdk.org/jdk/pull/28988/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28988&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374351 Stats: 73 lines in 1 file changed: 28 ins; 18 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/28988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28988/head:pull/28988 PR: https://git.openjdk.org/jdk/pull/28988 From fyang at openjdk.org Fri Dec 26 02:46:57 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 26 Dec 2025 02:46:57 GMT Subject: RFR: 8374351: RISC-V: Small refactoring for crypto macro-assembler routines In-Reply-To: References: Message-ID: On Thu, 25 Dec 2025 06:11:51 GMT, Anjian Wen wrote: > This patch is mainly for readability and subsequent GCM call requirements. > > 1. Extract the ghash function to facilitate subsequent calls during the implementation of aes-gcm > 2. Unify the prefixes of function names for aes intrinsic-related functions. Only use generate prefix for the main intrinsic function, delete the other functions `generate_` prefix Seems fine. One minor comment. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3059: > 3057: VectorRegister vtmp3 = v3; > 3058: > 3059: ghash_loop(subkeyH, state, data, blocks, vtmp1, vtmp2, vtmp3); `state` is the first param for `generate_ghash_processBlocks`. Can we simply swap the first two params to keep that order? ------------- PR Review: https://git.openjdk.org/jdk/pull/28988#pullrequestreview-3612652571 PR Review Comment: https://git.openjdk.org/jdk/pull/28988#discussion_r2647421422 From wenanjian at openjdk.org Fri Dec 26 03:06:37 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Fri, 26 Dec 2025 03:06:37 GMT Subject: RFR: 8374351: RISC-V: Small refactoring for crypto macro-assembler routines [v2] In-Reply-To: References: Message-ID: > This patch is mainly for readability and subsequent GCM call requirements. > > 1. Extract the ghash function to facilitate subsequent calls during the implementation of aes-gcm > 2. Unify the prefixes of function names for aes intrinsic-related functions. Only use generate prefix for the main intrinsic function, delete the other functions `generate_` prefix Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: Swap the order of the function name to make it clear ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28988/files - new: https://git.openjdk.org/jdk/pull/28988/files/84b3fde7..f98ee4a1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28988&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28988&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28988/head:pull/28988 PR: https://git.openjdk.org/jdk/pull/28988 From fyang at openjdk.org Fri Dec 26 03:06:37 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 26 Dec 2025 03:06:37 GMT Subject: RFR: 8374351: RISC-V: Small refactoring for crypto macro-assembler routines [v2] In-Reply-To: References: Message-ID: On Fri, 26 Dec 2025 03:02:20 GMT, Anjian Wen wrote: >> This patch is mainly for readability and subsequent GCM call requirements. >> >> 1. Extract the ghash function to facilitate subsequent calls during the implementation of aes-gcm >> 2. Unify the prefixes of function names for aes intrinsic-related functions. Only use generate prefix for the main intrinsic function, delete the other functions `generate_` prefix > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > Swap the order of the function name to make it clear Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28988#pullrequestreview-3612666341 From wenanjian at openjdk.org Fri Dec 26 03:06:37 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Fri, 26 Dec 2025 03:06:37 GMT Subject: RFR: 8374351: RISC-V: Small refactoring for crypto macro-assembler routines [v2] In-Reply-To: References: Message-ID: <5PsqT4pHcBxk6g5iXCjKht9mKKueFx6xi4hkhIqikKY=.2e641d2f-3321-48dd-b611-963d1a404c57@github.com> On Fri, 26 Dec 2025 02:43:35 GMT, Fei Yang wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> Swap the order of the function name to make it clear > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3059: > >> 3057: VectorRegister vtmp3 = v3; >> 3058: >> 3059: ghash_loop(subkeyH, state, data, blocks, vtmp1, vtmp2, vtmp3); > > `state` is the first param for `generate_ghash_processBlocks`. Can we simply swap the first two params to keep that order? sure, Thanks for the advice ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28988#discussion_r2647433037 From jbhateja at openjdk.org Fri Dec 26 06:40:03 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 26 Dec 2025 06:40:03 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v2] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Thu, 25 Dec 2025 07:06:39 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Extend the float/double value range I agree with @merykitty, @erifan and @DamonFool , to make this test more robust its better to remove functional validation part and only check for IR framework verification. ------------- PR Review: https://git.openjdk.org/jdk/pull/28960#pullrequestreview-3612871573 From fjiang at openjdk.org Fri Dec 26 10:01:52 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 26 Dec 2025 10:01:52 GMT Subject: RFR: 8374351: RISC-V: Small refactoring for crypto macro-assembler routines [v2] In-Reply-To: References: Message-ID: On Fri, 26 Dec 2025 03:06:37 GMT, Anjian Wen wrote: >> This patch is mainly for readability and subsequent GCM call requirements. >> >> 1. Extract the ghash function to facilitate subsequent calls during the implementation of aes-gcm >> 2. Unify the prefixes of function names for aes intrinsic-related functions. Only use generate prefix for the main intrinsic function, delete the other functions `generate_` prefix > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > Swap the order of the function name to make it clear Looks good! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/28988#pullrequestreview-3613192168 From epeter at openjdk.org Fri Dec 26 11:13:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Dec 2025 11:13:11 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v6] In-Reply-To: References: Message-ID: On Mon, 15 Dec 2025 03:01:39 GMT, Eric Fang wrote: >> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. >> >> This PR does the following optimizations: >> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. >> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect... > > Eric Fang has updated the pull request incrementally with one additional commit since the last revision: > > Refine code comments I'll review this again in early January, once I'm back from Christnas/New Year break ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28313#issuecomment-3692708993 From epeter at openjdk.org Fri Dec 26 11:13:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Dec 2025 11:13:13 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v5] In-Reply-To: <113AdKQ15cNGhCreLhNcnBjkOMh8riqUR8TnUCDKBPM=.7ec93b7a-6433-461c-8b61-93d360f7d712@github.com> References: <113AdKQ15cNGhCreLhNcnBjkOMh8riqUR8TnUCDKBPM=.7ec93b7a-6433-461c-8b61-93d360f7d712@github.com> Message-ID: On Wed, 24 Dec 2025 07:12:59 GMT, Eric Fang wrote: >> src/hotspot/share/opto/vectornode.cpp line 1492: >> >>> 1490: // vector[n]{bool} => vector[n]{t} => vector[n]{bool} >>> 1491: Node* in1 = VectorNode::uncast_mask(in(1)); >>> 1492: if (in1->Opcode() == Op_VectorLoadMask && length() == in1->as_Vector()->length()) { >> >> Can there be a mismatch with the length? Can you give me an example? > > Hi @eme64 , I?d really appreciate hearing your thoughts on this when you have a moment. Honestly, I'd just make it an assert, if your code currently does not expect it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2648043826 From epeter at openjdk.org Fri Dec 26 11:18:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Dec 2025 11:18:58 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v5] In-Reply-To: References: <113AdKQ15cNGhCreLhNcnBjkOMh8riqUR8TnUCDKBPM=.7ec93b7a-6433-461c-8b61-93d360f7d712@github.com> Message-ID: On Fri, 26 Dec 2025 11:09:30 GMT, Emanuel Peter wrote: >> Hi @eme64 , I?d really appreciate hearing your thoughts on this when you have a moment. > > Honestly, I'd just make it an assert, if your code currently does not expect it. That will force us to look at the example that would eventually violate the assert, and evaluate the situation. Maybe it would reveal a bug (and we could catch and fix it). Maybe it is a new pattern that means we have to rethink things more widely. Or maybe we'd just convert it back to a condition, but at least at that point we'd have an example that takes the other branch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2648050276 From jbhateja at openjdk.org Fri Dec 26 12:40:37 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 26 Dec 2025 12:40:37 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX Message-ID: Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. Kindly review and share your feedback. Best Regards, Jatin PS: Validation performed using Intel SDE 9.58. ------------- Commit messages: - 8373724: Assertion failure in TestSignumVector.java with UseAPX Changes: https://git.openjdk.org/jdk/pull/28999/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8373724 Stats: 74 lines in 1 file changed: 0 ins; 0 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/28999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28999/head:pull/28999 PR: https://git.openjdk.org/jdk/pull/28999 From jbhateja at openjdk.org Fri Dec 26 12:56:30 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 26 Dec 2025 12:56:30 GMT Subject: RFR: 8373724: Assertion failure in TestSignumVector.java with UseAPX [v2] In-Reply-To: References: Message-ID: > Existing demotable instruction patterns for negI/L_rReg_ndd have 'src' as their second operand, this leads to an failure during register biasing. Changing the NDD demotion flags names to encode explicit operand position i.e. **Flag_ndd_demotable_opr1 and Flag_ndd_demotable_opr2** , splitting commutative flag into seperate new flags and fine tuning assertion checks based on new naming convention fixes the issue. > > Failing test test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java passes with the patch. > > Kindly review and share your feedback. > > Best Regards, > Jatin > PS: Validation performed using Intel SDE 9.58. Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8373724: Assertion failure in TestSignumVector.java with UseAPX ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28999/files - new: https://git.openjdk.org/jdk/pull/28999/files/b75d280f..bc86d54d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28999&range=00-01 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28999/head:pull/28999 PR: https://git.openjdk.org/jdk/pull/28999 From phh at openjdk.org Fri Dec 26 22:21:00 2025 From: phh at openjdk.org (Paul Hohensee) Date: Fri, 26 Dec 2025 22:21:00 GMT Subject: RFR: 8374363: Update copyright year to 2025 for test/micro in files where it was missed In-Reply-To: References: Message-ID: On Fri, 26 Dec 2025 00:23:37 GMT, Sergey Bylokhov wrote: > The copyright year in "test/micro" files updated in 2025 has been bumped to 2025. > > **Note:** I have skipped all files updated by the https://github.com/openjdk/jdk/commit/beb43e2633900bb9ab3c975376fe5860b6d054e0 > > The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: > > `git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done` Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28995#pullrequestreview-3613921214 From serb at openjdk.org Sat Dec 27 04:49:10 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Sat, 27 Dec 2025 04:49:10 GMT Subject: Integrated: 8374363: Update copyright year to 2025 for test/micro in files where it was missed In-Reply-To: References: Message-ID: On Fri, 26 Dec 2025 00:23:37 GMT, Sergey Bylokhov wrote: > The copyright year in "test/micro" files updated in 2025 has been bumped to 2025. > > **Note:** I have skipped all files updated by the https://github.com/openjdk/jdk/commit/beb43e2633900bb9ab3c975376fe5860b6d054e0 > > The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: > > `git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done` This pull request has now been integrated. Changeset: 5c694eab Author: Sergey Bylokhov URL: https://git.openjdk.org/jdk/commit/5c694eab0f48045d2f71d0cd5ab53c1daddaa963 Stats: 15 lines in 15 files changed: 0 ins; 0 del; 15 mod 8374363: Update copyright year to 2025 for test/micro in files where it was missed Reviewed-by: phh ------------- PR: https://git.openjdk.org/jdk/pull/28995 From duke at openjdk.org Sat Dec 27 05:15:56 2025 From: duke at openjdk.org (Kirill Shirokov) Date: Sat, 27 Dec 2025 05:15:56 GMT Subject: RFR: 8344345: test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces [v3] In-Reply-To: References: Message-ID: > This PR addresses the trailing whitespaces for a .py test. > > They were introduced in commit 916694f2c1e7fc8d6a88e7026bc2d29ba2923849 and not detected by jcheck, since checking *.py for whitespaces is not enabled in .jcheck/conf. > > So, a separate question is: do you think that a pattern for Python files should be added to [checks "whitespace"] section of .jcheck/conf? Kirill Shirokov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8344345-clean-up-trailing-spaces - Merge branch 'master' into JDK-8344345-clean-up-trailing-spaces - 8344345: File test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27058/files - new: https://git.openjdk.org/jdk/pull/27058/files/345b18f0..c1cc10ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27058&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27058&range=01-02 Stats: 81767 lines in 2479 files changed: 51092 ins; 20172 del; 10503 mod Patch: https://git.openjdk.org/jdk/pull/27058.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27058/head:pull/27058 PR: https://git.openjdk.org/jdk/pull/27058 From duke at openjdk.org Sat Dec 27 05:16:01 2025 From: duke at openjdk.org (duke) Date: Sat, 27 Dec 2025 05:16:01 GMT Subject: RFR: 8344345: test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces [v2] In-Reply-To: References: Message-ID: On Tue, 25 Nov 2025 01:01:06 GMT, Kirill Shirokov wrote: >> This PR addresses the trailing whitespaces for a .py test. >> >> They were introduced in commit 916694f2c1e7fc8d6a88e7026bc2d29ba2923849 and not detected by jcheck, since checking *.py for whitespaces is not enabled in .jcheck/conf. >> >> So, a separate question is: do you think that a pattern for Python files should be added to [checks "whitespace"] section of .jcheck/conf? > > Kirill Shirokov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8344345-clean-up-trailing-spaces > - 8344345: File test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces @kirill-shirokov Your change (at version c1cc10aeee7e687ef5654378bd8713fec9f1915a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27058#issuecomment-3693686128 From lmesnik at openjdk.org Sat Dec 27 19:51:58 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 27 Dec 2025 19:51:58 GMT Subject: RFR: 8344345: test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces [v3] In-Reply-To: References: Message-ID: On Sat, 27 Dec 2025 05:15:56 GMT, Kirill Shirokov wrote: >> This PR addresses the trailing whitespaces for a .py test. >> >> They were introduced in commit 916694f2c1e7fc8d6a88e7026bc2d29ba2923849 and not detected by jcheck, since checking *.py for whitespaces is not enabled in .jcheck/conf. >> >> So, a separate question is: do you think that a pattern for Python files should be added to [checks "whitespace"] section of .jcheck/conf? > > Kirill Shirokov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8344345-clean-up-trailing-spaces > - Merge branch 'master' into JDK-8344345-clean-up-trailing-spaces > - 8344345: File test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/27058#pullrequestreview-3614417023 From lmesnik at openjdk.org Sat Dec 27 21:00:00 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 27 Dec 2025 21:00:00 GMT Subject: RFR: 8344345: test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces [v3] In-Reply-To: References: Message-ID: On Sat, 27 Dec 2025 05:15:56 GMT, Kirill Shirokov wrote: >> This PR addresses the trailing whitespaces for a .py test. >> >> They were introduced in commit 916694f2c1e7fc8d6a88e7026bc2d29ba2923849 and not detected by jcheck, since checking *.py for whitespaces is not enabled in .jcheck/conf. >> >> So, a separate question is: do you think that a pattern for Python files should be added to [checks "whitespace"] section of .jcheck/conf? > > Kirill Shirokov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8344345-clean-up-trailing-spaces > - Merge branch 'master' into JDK-8344345-clean-up-trailing-spaces > - 8344345: File test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces >> So, a separate question is: do you think that a pattern for Python files should be added to [checks "whitespace"] section of .jcheck/conf? Yes, please file issue for this or add a a part of this fix if all other .py files are fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27058#issuecomment-3694220050 From qamai at openjdk.org Sun Dec 28 07:31:46 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 28 Dec 2025 07:31:46 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v14] In-Reply-To: References: Message-ID: > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > ## The current PR: > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. > > I do not see a noticeable difference in C2 runtime with and without this patch. > > ## Future work: > > 1. Nested object: > > Consider this case: > > Holder h = new Holder(); > Object o = new Object(); > h.o = o; > > Currently, `o` will be considered escaped at `h.o = o`. However, it can be seen that `o` has not actually escaped because `h` has not escaped. Luckily, with the current approach, this can be easily achieved, notice how this loop is just "if anything escapes, consider `base` escapes", currently, the "anything" here includes `base` and its aliases. if we include the base of the object at which `o` is stored, then we can correctly determine if `o` has escaped. > > // Find all nodes that may escape alloc, and decide that it is provable that they must be > // executed after ctl > EscapeStatus res = NOT_ESCAPED; > aliases.push(base); > for (uint idx = 0; idx < aliases.size(); idx++) { > Node* n = aliases.at(idx); > > 2. Fold a memory `Phi`. > > This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. > > 3. Fold a pointer `Phi`. > > This can be easy, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: > > Point p1 = new Point; > Point p2 = new Point; > p1.x = v1; > p2.x = v2; > Point p = Phi(p1, p2); > int a = p.x; > > Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. > > Another i... Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - copyright year, return, comments, whitespace - Merge branch 'master' into loadfoldingigvn - ea of phis and nested objects - Add test scenarios - Add a flag to turn off the feature - Much more comments, refactor the data into a separate class - Cheaper and stronger assert, add test for devirtualization - consistently use phase->value during IGVN - safepoints do not have a memory output - be even more rigorous - ... and 5 more: https://git.openjdk.org/jdk/compare/beebaaef...b2b2e5c2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/c546d216..b2b2e5c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=12-13 Stats: 20616 lines in 1604 files changed: 12143 ins; 2944 del; 5529 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From qamai at openjdk.org Sun Dec 28 07:33:45 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 28 Dec 2025 07:33:45 GMT Subject: RFR: 8373999: C2: apply KnownBits and unsigned bounds to Add / Sub operations [v4] In-Reply-To: References: Message-ID: > Hi, > > This PR improves the implementation of `AddNode/SubNode::Value` by taking advantage of the additional information in `TypeInt`. The implementation has some pretty non-trivial logic. Fortunately, the test infrastructure is already there. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - copyright year - Merge branch 'master' into addsub - Merge branch 'master' into addsub - include order - Improve Add/SubNode::Value with unsigned bounds and known bits ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28897/files - new: https://git.openjdk.org/jdk/pull/28897/files/a0ff1f67..fe534505 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28897&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28897&range=02-03 Stats: 2288 lines in 999 files changed: 597 ins; 281 del; 1410 mod Patch: https://git.openjdk.org/jdk/pull/28897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28897/head:pull/28897 PR: https://git.openjdk.org/jdk/pull/28897 From qamai at openjdk.org Sun Dec 28 07:35:32 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 28 Dec 2025 07:35:32 GMT Subject: RFR: 8374180: C2 crash in PhaseCCP::verify_type - fatal error: Not monotonic [v2] In-Reply-To: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> References: <8Ocrk1zJxWfzFaFo_ohWCL76KAhe44SKoRuqdBjxQ6Q=.89969b47-d407-4d82-bc44-b326d78ba880@github.com> Message-ID: > Hi, > > The issue here is the inconsistency in computing the `_widen` field of the `TypeInt`. At the first step, the types of the operands are: > > t1 = int:0 > t2 = int:-2..3, widen = 3 > > Since the type of the first operand is a constant zero, `AddNode::Value` returns the type of the second operand directly, as `x ^ 0 == x for all x`. In the second step, `t1` is widened to `0..2`. This triggers the real computation of the result. The algorithm then splits `t2` into `t21 = int:-2..-1` and `t22 = int:0..3`. The `Xor` of these with `t1` are `r1 = int:-4..-1` and `r2 = int:0..3`. As both have `_hi - _lo <= SMALL_TYPEINT_THRESHOLD == 3`, their `_widen`s are normalized to `0`. As a result, their `meet` also has `_widen == 0`. This value is smaller than that from the previous step, which was `3`, which leads to the failure. > > The root cause here is that, the `_widen` value of a node should be computed and normalized on the whole range of the node, not on its subranges, which may normalize it to `0` in more cases than what is expected. As a result, my proposed solution is to ignore the `_widen` value of the subranges, and pass the expected `_widen` value when composing the final result. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - copyright year - Merge branch 'master' into widen - RangeInference::infer should ensure correct value of _widen ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28952/files - new: https://git.openjdk.org/jdk/pull/28952/files/f9b8615c..2fb0af13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28952&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28952&range=00-01 Stats: 2288 lines in 1000 files changed: 597 ins; 281 del; 1410 mod Patch: https://git.openjdk.org/jdk/pull/28952.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28952/head:pull/28952 PR: https://git.openjdk.org/jdk/pull/28952 From wenanjian at openjdk.org Sun Dec 28 09:10:51 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sun, 28 Dec 2025 09:10:51 GMT Subject: RFR: 8374351: RISC-V: Small refactoring for crypto macro-assembler routines [v2] In-Reply-To: References: Message-ID: <3DKqDICdZdw8Ry_M_uV_cM7T47-W_YRXsgSKzpsd39o=.e9bb390d-de63-479b-9df2-b1ae84024763@github.com> On Fri, 26 Dec 2025 03:00:11 GMT, Fei Yang wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> Swap the order of the function name to make it clear > > Thanks! @RealFYang @feilongjiang Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28988#issuecomment-3694588235 From wenanjian at openjdk.org Sun Dec 28 09:16:03 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Sun, 28 Dec 2025 09:16:03 GMT Subject: Integrated: 8374351: RISC-V: Small refactoring for crypto macro-assembler routines In-Reply-To: References: Message-ID: On Thu, 25 Dec 2025 06:11:51 GMT, Anjian Wen wrote: > This patch is mainly for readability and subsequent GCM call requirements. > > 1. Extract the ghash function to facilitate subsequent calls during the implementation of aes-gcm > 2. Unify the prefixes of function names for aes intrinsic-related functions. Only use generate prefix for the main intrinsic function, delete the other functions `generate_` prefix This pull request has now been integrated. Changeset: 5e685f6f Author: Anjian Wen URL: https://git.openjdk.org/jdk/commit/5e685f6f2c7872a4239ef0c0a0afa60f4526529e Stats: 73 lines in 1 file changed: 28 ins; 18 del; 27 mod 8374351: RISC-V: Small refactoring for crypto macro-assembler routines Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/28988 From serb at openjdk.org Sun Dec 28 09:44:34 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Sun, 28 Dec 2025 09:44:34 GMT Subject: RFR: 8374378: Update copyright year to 2025 for jdk.internal.vm.ci in files where it was missed Message-ID: The copyright year in jdk.internal.vm.ci files updated in 2025 has been bumped to 2025. The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: `git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done` ------------- Commit messages: - 8374378: Update copyright year to 2025 for jdk.internal.vm.ci in files where it was missed Changes: https://git.openjdk.org/jdk/pull/29005/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=29005&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8374378 Stats: 23 lines in 23 files changed: 0 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/29005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29005/head:pull/29005 PR: https://git.openjdk.org/jdk/pull/29005 From qamai at openjdk.org Sun Dec 28 11:53:48 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 28 Dec 2025 11:53:48 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v15] In-Reply-To: References: Message-ID: > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > ## The current PR: > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. > > I do not see a noticeable difference in C2 runtime with and without this patch. > > ## Future work: > > 1. Nested object: > > Consider this case: > > Holder h = new Holder(); > Object o = new Object(); > h.o = o; > > Currently, `o` will be considered escaped at `h.o = o`. However, it can be seen that `o` has not actually escaped because `h` has not escaped. Luckily, with the current approach, this can be easily achieved, notice how this loop is just "if anything escapes, consider `base` escapes", currently, the "anything" here includes `base` and its aliases. if we include the base of the object at which `o` is stored, then we can correctly determine if `o` has escaped. > > // Find all nodes that may escape alloc, and decide that it is provable that they must be > // executed after ctl > EscapeStatus res = NOT_ESCAPED; > aliases.push(base); > for (uint idx = 0; idx < aliases.size(); idx++) { > Node* n = aliases.at(idx); > > 2. Fold a memory `Phi`. > > This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. > > 3. Fold a pointer `Phi`. > > This can be easy, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: > > Point p1 = new Point; > Point p2 = new Point; > p1.x = v1; > p2.x = v2; > Point p = Phi(p1, p2); > int a = p.x; > > Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. > > Another i... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Fix outdated and unclear comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/b2b2e5c2..5a34377d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=13-14 Stats: 26 lines in 1 file changed: 9 ins; 6 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From xgong at openjdk.org Mon Dec 29 01:31:59 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 29 Dec 2025 01:31:59 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: <1jqibU2eTHxP46wnhGIgORbcJiM0aAxxLAqqHUAvo_0=.d6a999d8-250f-4b07-93c2-1089e63f49ae@github.com> On Thu, 25 Dec 2025 08:08:01 GMT, Jie Fu wrote: > > Thanks for the suggestion. I understand your point, and it sounds reasonable to me. In this test, the tolerance calculation is based on the existing Vector API jtreg reduction tests. For example, here is the approach used in `Float128VectorTests`: > > https://github.com/openjdk/jdk/blob/73a8629c5b52b678febcc9d339e01ebcc5277909/test/jdk/jdk/incubator/vector/Float128VectorTests.java#L143-L156 > > May I ask would the example you mentioned above fail for negative floats? Yes, it will fail as well after I manually changed the values in the float array. > Note: I've provided a reproducer which would 100% fail for your current implementation with negative floats, which seems unacceptable to me. > > Also I'm still not sure if there are corner cases with the 1~3000 range since the logic has been proved wrong with negative floats. I think the issue happens when there are both positive and negative values which would create an extremely small value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3695229101 From xgong at openjdk.org Mon Dec 29 01:32:01 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 29 Dec 2025 01:32:01 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v2] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: <8q5-hiBHKLTI1gwHUt_2TIvBWqYso07dM25uILbhLPE=.f2edb7bc-10d8-41cf-a3b0-34a5a04f247e@github.com> On Thu, 25 Dec 2025 16:10:39 GMT, Quan Anh Mai wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Extend the float/double value range > > Personally, I would prefer either a provably correct sum verification, or no verification at all since it was mentioned that this test's main purpose is to check the existence of IR nodes. > I agree with @merykitty, @erifan and @DamonFool , to make this test more robust its better to remove functional validation part and only check for IR framework verification. OK, I will remove the verification part for floating-point tests. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3695229633 From qamai at openjdk.org Mon Dec 29 14:53:46 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 29 Dec 2025 14:53:46 GMT Subject: RFR: 8373495: C2: Aggressively fold loads from objects that have not escaped [v16] In-Reply-To: References: Message-ID: <2EiCLUhS6yKEXp8RQlAOmaZrwvxPghYYw6u_PW5m2iM=.3ad470ab-c926-4fe4-bf12-8872ce22c7ef@github.com> > Hi, > > This patch is an alternative to #28764 but it does the analysis during IGVN instead. > > ## The current PR: > > The current escape analysis mechanism is all-or-nothing: either the object does not escape, or it does. If the object escapes, we lose the ability to analyse the values of its fields completely, even if the object only escapes at return. > > This PR tries to find the escape status of an object at a load, and if it is decided that the object has not escaped there, we can try folding the load aggressively, ignoring calls and memory barriers to find a corresponding store that the load observes. Implementation-wise, when walking at `find_previous_store`, if we encounter a call or memory barrier, we start looking at all nodes that make the allocation escape. If all such nodes have a control input that is not a transitive control input of the call/barrier we are at, then we can decidedly say that the allocation has not escaped at that call/barrier, and walk past that call/barrier to find a corresponding store. > > I do not see a noticeable difference in C2 runtime with and without this patch. > > ## Future work: > > 1. Fold a memory `Phi`. > > This is pretty straightforward. We need to create a value `Phi` for each memory `Phi` so that we can handle loop `Phi`s. > > 2. Fold a pointer `Phi`. > > Currently, this PR is doing the trivial approach, just give up if we don't encounter a store into that `Phi`. However, we can do better. Consider this case: > > Point p1 = new Point; > Point p2 = new Point; > p1.x = v1; > p2.x = v2; > Point p = Phi(p1, p2); > int a = p.x; > > Then, `a` should be able to be folded to `Phi(v1, v2)` if `p1` and `p2` are known not to alias. > > Another interesting case: > > Point p = Phi(p1, p2); > p.x = v; > p1.x = v1; > int a = p.x; > > Then, theoretically, we can fold `a` to `Phi(v1, v)` if `p1` and `p2` are known not to alias. > > 3. Nested objects > > It can be observed that if an object is stored into a memory that has not escaped, then it can be considered that the object has not escaped. For example: > > Point p = new Point; > PointHolder h = new PointHolder; > h.p = p; > int x = p.x; > escape(h); > > Then, `p` can be considered that it has not escaped until `escape(h)`. To do this, the computation of `_aliases` in the constructor of `LocalEA` needs to be more comprehensive. See the comments in `LocalEA::check_escape_status`. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Fix escape at store ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28812/files - new: https://git.openjdk.org/jdk/pull/28812/files/5a34377d..06fb10fe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28812&range=14-15 Stats: 32 lines in 2 files changed: 0 ins; 13 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/28812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28812/head:pull/28812 PR: https://git.openjdk.org/jdk/pull/28812 From bkilambi at openjdk.org Mon Dec 29 17:39:42 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 29 Dec 2025 17:39:42 GMT Subject: RFR: 8366444: Add support for add/mul reduction operations for Float16 [v5] In-Reply-To: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> References: <-ovnBfgX38snqeb0xcSNFTeHOfi6uucPLID1asGwI3E=.7f1c09e9-d14a-4b60-ba9a-2811011881c3@github.com> Message-ID: > This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. > > Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - > > **For AddReduction :** > On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar `fadd` instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. > > On SVE targets (UseSVE > 0): Generates the `fadda` instruction which computes add reduction for floating point in strict order. > > **For MulReduction :** > Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar `fmul` instructions is generated and multiply reduction for vector lengths > 16B is not supported. > > Below is the performance of the two newly added microbenchmarks in `Float16OperationsBenchmark.java` tested on three different aarch64 machines and with varying `MaxVectorSize` - > > Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads (`ldrsh`) to load the FP16 value into an FPR and a scalar `fadd/fmul` to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. > Ratio > 1 indicates the performance with this patch is better than the master branch. > > **N1 (UseSVE = 0, max vector length = 16B):** > > Benchmark vectorDim Mode Cnt 8B 16B > ReductionAddFP16 256 thrpt 9 1.41 1.40 > ReductionAddFP16 512 thrpt 9 1.41 1.41 > ReductionAddFP16 1024 thrpt 9 1.43 1.40 > ReductionAddFP16 2048 thrpt 9 1.43 1.40 > ReductionMulFP16 256 thrpt 9 1.22 1.22 > ReductionMulFP16 512 thrpt 9 1.21 1.23 > ReductionMulFP16 1024 thrpt 9 1.21 1.22 > ReductionMulFP16 2048 thrpt 9 1.20 1.22 > > > On N1, the scalarized sequence of `fadd/fmul` are generated for both `MaxVectorSize` of 8B and 16B for add reduction ... Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Address review comments for the JTREG test and microbenchmark - Merge branch 'master' - Address review comments - Fix build failures on Mac - Address review comments - Merge 'master' - 8366444: Add support for add/mul reduction operations for Float16 This patch adds mid-end support for vectorized add/mul reduction operations for half floats. It also includes backend aarch64 support for these operations. Only vectorization support through autovectorization is added as VectorAPI currently does not support Float16 vector species. Both add and mul reduction vectorized through autovectorization mandate the implementation to be strictly ordered. The following is how each of these reductions is implemented for different aarch64 targets - For AddReduction : On Neon only targets (UseSVE = 0): Generates scalarized additions using the scalar "fadd" instruction for both 8B and 16B vector lengths. This is because Neon does not provide a direct instruction for computing strictly ordered floating point add reduction. On SVE targets (UseSVE > 0): Generates the "fadda" instruction which computes add reduction for floating point in strict order. For MulReduction : Both Neon and SVE do not provide a direct instruction for computing strictly ordered floating point multiply reduction. For vector lengths of 8B and 16B, a scalarized sequence of scalar "fmul" instructions is generated and multiply reduction for vector lengths > 16B is not supported. Below is the performance of the two newly added microbenchmarks in Float16OperationsBenchmark.java tested on three different aarch64 machines and with varying MaxVectorSize - Note: On all machines, the score (ops/ms) is compared with the master branch without this patch which generates a sequence of loads ("ldrsh") to load the FP16 value into an FPR and a scalar "fadd/fmul" to add/multiply the loaded value to the running sum/product. The ratios given below are the ratios between the throughput with this patch and the throughput without this patch. Ratio > 1 indicates the performance with this patch is better than the master branch. N1 (UseSVE = 0, max vector length = 16B): Benchmark vectorDim Mode Cnt 8B 16B ReductionAddFP16 256 thrpt 9 1.41 1.40 ReductionAddFP16 512 thrpt 9 1.41 1.41 ReductionAddFP16 1024 thrpt 9 1.43 1.40 ReductionAddFP16 2048 thrpt 9 1.43 1.40 ReductionMulFP16 256 thrpt 9 1.22 1.22 ReductionMulFP16 512 thrpt 9 1.21 1.23 ReductionMulFP16 1024 thrpt 9 1.21 1.22 ReductionMulFP16 2048 thrpt 9 1.20 1.22 On N1, the scalarized sequence of fadd/fmul are generated for both MaxVectorSize of 8B and 16B for add reduction and mul reduction respectively. V1 (UseSVE = 1, max vector length = 32B): Benchmark vectorDim Mode Cnt 8B 16B 32B ReductionAddFP16 256 thrpt 9 1.11 1.75 2.02 ReductionAddFP16 512 thrpt 9 1.02 1.64 1.93 ReductionAddFP16 1024 thrpt 9 1.02 1.59 1.85 ReductionAddFP16 2048 thrpt 9 1.02 1.56 1.80 ReductionMulFP16 256 thrpt 9 1.12 0.99 1.09 ReductionMulFP16 512 thrpt 9 1.04 1.01 1.04 ReductionMulFP16 1024 thrpt 9 1.02 1.02 1.00 ReductionMulFP16 2048 thrpt 9 1.01 1.01 1.00 On V1, for MaxVectorSize = 8: scalarized fadd/fmul sequence will be generated for AddReductionVHF/MulReductionVHF as UseSVE defaults to 0 [2]. For MaxVectorSize = 16: scalarized "fmul" sequence is generated for MulReductionVHF and "fadda" is generated for AddReductionVHF which fetches signficant gains. For MaxVectorSize = 32: Autovectorization of MulReductionVHF is disabled for MaxVectorSize > 16B so the autovectorizer checks for maximal implemented size[1] which is 16B and generates scalarized "fmul" sequence for 16B in this case. For AddReductionVHF, it generates the "fadda" instruction. V2 (UseSVE = 2, max vector length = 16B) Benchmark vectorDim Mode Cnt 8B 16B ReductionAddFP16 256 thrpt 9 1.16 1.70 ReductionAddFP16 512 thrpt 9 1.02 1.61 ReductionAddFP16 1024 thrpt 9 1.01 1.53 ReductionAddFP16 2048 thrpt 9 1.00 1.49 ReductionMulFP16 256 thrpt 9 1.18 0.99 ReductionMulFP16 512 thrpt 9 1.04 1.01 ReductionMulFP16 1024 thrpt 9 1.02 1.02 ReductionMulFP16 2048 thrpt 9 1.01 1.01 On V2, for MaxVectorSize = 8: scalarized fadd/fmul sequence will be generated as UseSVE defaults to 0 [2]. For MaxVectorSize = 16: "fadda" instruction is generated for AddReductionVHF which results in significant gains in performance. For MulReductionVHF, the scalarized "fmul" sequence will be generated. Testing: hotspot_all, jdk(tiers1-3) and langtools(tier1) all pass on N1/V1/V2. [1] https://github.com/openjdk/jdk/blob/a272696813f2e5e896ac9de9985246aaeb9d476c/src/hotspot/share/opto/superword.cpp#L1677 [2] https://github.com/openjdk/jdk/blob/a272696813f2e5e896ac9de9985246aaeb9d476c/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L479 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27526/files - new: https://git.openjdk.org/jdk/pull/27526/files/21ad1c93..f2685b21 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27526&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27526&range=03-04 Stats: 7865 lines in 1198 files changed: 4087 ins; 1420 del; 2358 mod Patch: https://git.openjdk.org/jdk/pull/27526.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27526/head:pull/27526 PR: https://git.openjdk.org/jdk/pull/27526 From xpeng at openjdk.org Mon Dec 29 21:12:13 2025 From: xpeng at openjdk.org (Xiaolong Peng) Date: Mon, 29 Dec 2025 21:12:13 GMT Subject: RFR: 8344345: test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces [v3] In-Reply-To: References: Message-ID: <7eE4bX-yxQB4j0lAp8aTZSIgBxxG8gyi0-kWA0Tt0u4=.6aaba15a-87ae-42e9-8f4e-4c288cd981b4@github.com> On Sat, 27 Dec 2025 05:15:56 GMT, Kirill Shirokov wrote: >> This PR addresses the trailing whitespaces for a .py test. >> >> They were introduced in commit 916694f2c1e7fc8d6a88e7026bc2d29ba2923849 and not detected by jcheck, since checking *.py for whitespaces is not enabled in .jcheck/conf. >> >> So, a separate question is: do you think that a pattern for Python files should be added to [checks "whitespace"] section of .jcheck/conf? > > Kirill Shirokov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into JDK-8344345-clean-up-trailing-spaces > - Merge branch 'master' into JDK-8344345-clean-up-trailing-spaces > - 8344345: File test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces The failure in "Pre-submit tests - linux-x64" is not related to the change, looks like the worker instance is out of disk space. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27058#issuecomment-3697548704 From duke at openjdk.org Mon Dec 29 21:12:15 2025 From: duke at openjdk.org (Kirill Shirokov) Date: Mon, 29 Dec 2025 21:12:15 GMT Subject: Integrated: 8344345: test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces In-Reply-To: References: Message-ID: On Tue, 2 Sep 2025 18:22:45 GMT, Kirill Shirokov wrote: > This PR addresses the trailing whitespaces for a .py test. > > They were introduced in commit 916694f2c1e7fc8d6a88e7026bc2d29ba2923849 and not detected by jcheck, since checking *.py for whitespaces is not enabled in .jcheck/conf. > > So, a separate question is: do you think that a pattern for Python files should be added to [checks "whitespace"] section of .jcheck/conf? This pull request has now been integrated. Changeset: 078e71f4 Author: Kirill Shirokov Committer: Xiaolong Peng URL: https://git.openjdk.org/jdk/commit/078e71f4a3d68d298ab3c383e46d18912e1de7db Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod 8344345: test/hotspot/gtest/x86/x86-asmtest.py has trailing whitespaces Reviewed-by: phh, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/27058 From xgong at openjdk.org Tue Dec 30 01:26:50 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 30 Dec 2025 01:26:50 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v3] In-Reply-To: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: > The test fails intermittently with the following error: > > > Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) > at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) > at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) > > > The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. > > For example, given array elements: > > [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] > > > Sequential scalar addition produces: > > 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f > > > However, `reduceLanes()` might compute: > > (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL > > > The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. > > Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. > > This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. > > Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. > > [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Remove verification for floating-point add reduction tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28960/files - new: https://git.openjdk.org/jdk/pull/28960/files/433efddc..eb75c5f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28960&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28960&range=01-02 Stats: 46 lines in 1 file changed: 1 ins; 34 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/28960.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28960/head:pull/28960 PR: https://git.openjdk.org/jdk/pull/28960 From xgong at openjdk.org Tue Dec 30 01:29:54 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 30 Dec 2025 01:29:54 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v2] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Thu, 25 Dec 2025 23:27:19 GMT, Jie Fu wrote: >> Personally, I would prefer either a provably correct sum verification, or no verification at all since it was mentioned that this test's main purpose is to check the existence of IR nodes. > >> no verification at all since it was mentioned that this test's main purpose is to check the existence of IR nodes. > > I'm fine with this suggestion. Hi @DamonFool , @merykitty , @jatin-bhateja and @erifan , I'v removed the verification for two FP add reduction tests, could you please help take another look? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3697996119 From erfang at openjdk.org Tue Dec 30 05:46:59 2025 From: erfang at openjdk.org (Eric Fang) Date: Tue, 30 Dec 2025 05:46:59 GMT Subject: RFR: 8373722: [TESTBUG] compiler/vectorapi/TestVectorOperationsWithPartialSize.java fails intermittently [v3] In-Reply-To: References: <2U-bSjLSxtOOvOELe9fP-oaVZx88Xh9YaeifbYaEmUQ=.ee4dd046-8e3c-4aeb-9c42-db4fe37f6c6b@github.com> Message-ID: On Tue, 30 Dec 2025 01:26:50 GMT, Xiaohong Gong wrote: >> The test fails intermittently with the following error: >> >> >> Caused by: java.lang.RuntimeException: assertEqualsWithTolerance: expected 0.0 but was 1.1754945E-38 (tolerance: 1.4E-44, diff: 1.1754945E-38) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.verifyAddReductionFloat(TestVectorOperationsWithPartialSize.java:231) >> at compiler.vectorapi.TestVectorOperationsWithPartialSize.testAddReductionFloat(TestVectorOperationsWithPartialSize.java:260) >> >> >> The root cause is that the Vector API `reduceLanes()` does not guarantee a specific calculation order for floating-point reduction operations [1]. When the array contains extreme values, this can produce results outside the tolerance range compared to sequential scalar addition. >> >> For example, given array elements: >> >> [0.0f, Float.MIN_NORMAL, Float.MAX_VALUE, -Float.MAX_VALUE] >> >> >> Sequential scalar addition produces: >> >> 0.0f + Float.MIN_NORMAL + Float.MAX_VALUE - Float.MAX_VALUE = 0.0f >> >> >> However, `reduceLanes()` might compute: >> >> (0.0f + Float.MIN_NORMAL) + (Float.MAX_VALUE - Float.MAX_VALUE) = Float.MIN_NORMAL >> >> >> The difference of the two times of calculation is `Float.MIN_NORMAL` (1.1754945E-38), which exceeds the tolerance of `Math.ulp(0.0f) * 10.0f = 1.4E-44`. Even with a 10x rounding error factor, the tolerance is insufficient for such edge cases. >> >> Since `reduceLanes()` does not require a specific calculation order, differences from scalar results can be significantly larger when special or extreme maximum/minimum values are present. Using a fixed tolerance is inappropriate for such corner cases. >> >> This patch fixes the issue by initializing the float array in test with random normal values within a specified range, ensuring the result gap stays within the defined tolerance. >> >> Tested locally on my AArch64 and X86_64 machines 500 times, and I didn't observe the failure again. >> >> [1] https://docs.oracle.com/en/java/javase/25/docs/api/jdk.incubator.vector/jdk/incubator/vector/FloatVector.html#reduceLanes(jdk.incubator.vector.VectorOperators.Associative) > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Remove verification for floating-point add reduction tests Marked as reviewed by erfang (Author). Thanks for fixing the bug! ------------- PR Review: https://git.openjdk.org/jdk/pull/28960#pullrequestreview-3617304143 PR Comment: https://git.openjdk.org/jdk/pull/28960#issuecomment-3698367860 From jbechberger at openjdk.org Tue Dec 30 09:52:00 2025 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Tue, 30 Dec 2025 09:52:00 GMT Subject: RFR: 8374195: TestReplaceNarrowPhiWithBottomPhi fails on ppc64 platforms in (fast)debug In-Reply-To: References: Message-ID: On Tue, 23 Dec 2025 16:53:30 GMT, Martin Doerr wrote: > This test makes assumptions about the C2 ideal graph which are not true for PPC64. We need to get tests green also in jdk26 where the new test has landed in the meantime, so simply disabling it for the platform. > Test improvements can be done later if needed. Marked as reviewed by jbechberger (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28964#pullrequestreview-3617746699 From mdoerr at openjdk.org Tue Dec 30 09:52:01 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 30 Dec 2025 09:52:01 GMT Subject: RFR: 8374195: TestReplaceNarrowPhiWithBottomPhi fails on ppc64 platforms in (fast)debug In-Reply-To: References: Message-ID: On Tue, 23 Dec 2025 16:53:30 GMT, Martin Doerr wrote: > This test makes assumptions about the C2 ideal graph which are not true for PPC64. We need to get tests green also in jdk26 where the new test has landed in the meantime, so simply disabling it for the platform. > Test improvements can be done later if needed. Thanks for the reviews! I'll also create a backport. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28964#issuecomment-3698849313 From mdoerr at openjdk.org Tue Dec 30 09:52:04 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 30 Dec 2025 09:52:04 GMT Subject: Integrated: 8374195: TestReplaceNarrowPhiWithBottomPhi fails on ppc64 platforms in (fast)debug In-Reply-To: References: Message-ID: On Tue, 23 Dec 2025 16:53:30 GMT, Martin Doerr wrote: > This test makes assumptions about the C2 ideal graph which are not true for PPC64. We need to get tests green also in jdk26 where the new test has landed in the meantime, so simply disabling it for the platform. > Test improvements can be done later if needed. This pull request has now been integrated. Changeset: e4e923a1 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/e4e923a1ffc8ff059c983c7e9201d0ee3273482d Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8374195: TestReplaceNarrowPhiWithBottomPhi fails on ppc64 platforms in (fast)debug Reviewed-by: mbaesken, jbechberger ------------- PR: https://git.openjdk.org/jdk/pull/28964 From duke at openjdk.org Tue Dec 30 14:46:19 2025 From: duke at openjdk.org (duke) Date: Tue, 30 Dec 2025 14:46:19 GMT Subject: Withdrawn: 8350864: C2: verify structural invariants of the Ideal graph In-Reply-To: References: Message-ID: On Thu, 17 Jul 2025 07:25:10 GMT, Marc Chevalier wrote: > Some crashes are consequences of earlier misshaped ideal graphs, which could be detected earlier, closer to the source, before the possibly many transformations that lead to the crash. > > Let's verify that the ideal graph is well-shaped earlier then! I propose here such a feature. This runs after IGVN, because at this point, the graph, should be cleaned up for any weirdness happening earlier or during IGVN. > > This feature is enabled with the develop flag `VerifyIdealStructuralInvariants`. Open to renaming. No problem with me! This feature is only available in debug builds, and most of the code is even not compiled in product, since it uses some debug-only functions, such as `Node::dump` or `Node::Name`. > > For now, only local checks are implemented: they are checks that only look at a node and its neighborhood, wherever it happens in the graph. Typically: under a `If` node, we have a `IfTrue` and a `IfFalse`. To ease development, each check is implemented in its own class, independently of the others. Nevertheless, one needs to do always the same kind of things: checking there is an output of such type, checking there is N inputs, that the k-th input has such type... To ease writing such checks, in a readable way, and in a less error-prone way than pile of copy-pasted code that manually traverse the graph, I propose a set of compositional helpers to write patterns that can be matched against the ideal graph. Since these patterns are... patterns, so not related to a specific graph, they can be allocated once and forever. When used, one provides the node (called center) around which one want to check if the pattern holds. > > On top of making the description of pattern easier, these helpers allows nice printing in case of error, by showing the path from the center to the violating node. For instance (made up for the purpose of showing the formatting), a violation with a path climbing only inputs: > > 1 failure for node > 211 OuterStripMinedLoopEnd === 215 39 [[ 212 198 ]] P=0,948966, C=23799,000000 > At node > 209 CountedLoopEnd === 182 208 [[ 210 197 ]] [lt] P=0,948966, C=23799,000000 !orig=[196] !jvms: StringLatin1::equals @ bci:12 (line 100) > From path: > [center] 211 OuterStripMinedLoopEnd === 215 39 [[ 212 198 ]] P=0,948966, C=23799,000000 > <-(0)- 215 SafePoint === 210 1 7 1 1 216 37 54 185 [[ 211 ]] SafePoint !orig=186 !jvms: StringLatin1::equals @ bci:29 (line 100) > <-(0)- 210 IfFalse === 209 [[ 215 216 ]] #0 !orig=198 !jvms: StringL... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/26362 From fferrari at openjdk.org Tue Dec 30 17:36:03 2025 From: fferrari at openjdk.org (Francisco Ferrari Bihurriet) Date: Tue, 30 Dec 2025 17:36:03 GMT Subject: RFR: 8364970: Redo JDK-8327381 by updating the CmpU type instead of the Bool type [v5] In-Reply-To: <-r3eHhuzv6NeHNJ0YRpTey8a8YN8q52kXR1fym1RGd0=.1a490d02-260b-493c-b377-d65ad68bca41@github.com> References: <-r3eHhuzv6NeHNJ0YRpTey8a8YN8q52kXR1fym1RGd0=.1a490d02-260b-493c-b377-d65ad68bca41@github.com> Message-ID: <9cCuUyzxmXXb_YXfeTD5qtHla5siqCXwnulKgi9rFb0=.0f81c087-db30-418f-ad6f-818073653f56@github.com> On Thu, 14 Aug 2025 18:35:53 GMT, Francisco Ferrari Bihurriet wrote: >> Hi, this pull request is a second take of 1383fec41756322bf2832c55633e46395b937b40, by updating the `CmpUNode` type as either `TypeInt::CC_LE` (case 1a) or `TypeInt::CC_LT` (case 1b) instead of updating the `BoolNode` type as `TypeInt::ONE`. >> >> With this approach a56cd371a2c497e4323756f8b8a08a0bba059bf2 becomes unnecessary. Additionally, having the right type in `CmpUNode` could potentially enable further optimizations. >> >> #### Testing >> >> In order to evaluate the changes, the following testing has been performed: >> >> * `jdk:tier1` (see [GitHub Actions run](https://github.com/franferrax/jdk/actions/runs/16789994433)) >> * [`TestBoolNodeGVN.java`](https://github.com/openjdk/jdk/blob/jdk-26+9/test/hotspot/jtreg/compiler/c2/gvn/TestBoolNodeGVN.java), created for [JDK-8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value](https://bugs.openjdk.org/browse/JDK-8327381) (1383fec41756322bf2832c55633e46395b937b40) >> * I also checked it breaks if I remove the `CmpUNode::Value_cmpu_and_mask` call >> * Private reproducer for [JDK-8349584: Improve compiler processing](https://bugs.openjdk.org/browse/JDK-8349584) (a56cd371a2c497e4323756f8b8a08a0bba059bf2) >> * A local slowdebug run of the `test/hotspot/jtreg/compiler/c2` category on _Fedora Linux x86_64_ >> * Same results as with `master` (f95af744b07a9ec87e2507b3d584cbcddc827bbd) > > Francisco Ferrari Bihurriet has updated the pull request incrementally with one additional commit since the last revision: > > Accept @merykitty's suggestion I hope I'll be able to resume the work on this one after the January CPU. ------------- PR Comment: https://git.openjdk.org/jdk/pull/26666#issuecomment-3700029767 From serb at openjdk.org Tue Dec 30 23:09:02 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Tue, 30 Dec 2025 23:09:02 GMT Subject: RFR: 8374378: Update copyright year to 2025 for jdk.internal.vm.ci in files where it was missed In-Reply-To: References: Message-ID: <4YufPcQBmFgO8Zkr4jw3f6NtsuhG2kviO-5hyA6wjl8=.a62919bc-2a08-4d08-87a1-eef0491fb473@github.com> On Sun, 28 Dec 2025 07:48:22 GMT, Sergey Bylokhov wrote: > The copyright year in jdk.internal.vm.ci files updated in 2025 has been bumped to 2025. > > The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: > > `git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done` Looking for volunteers to review this patch. It cannot be integrated in 2026. ------------- PR Comment: https://git.openjdk.org/jdk/pull/29005#issuecomment-3700774586 From phh at openjdk.org Wed Dec 31 01:45:57 2025 From: phh at openjdk.org (Paul Hohensee) Date: Wed, 31 Dec 2025 01:45:57 GMT Subject: RFR: 8374378: Update copyright year to 2025 for jdk.internal.vm.ci in files where it was missed In-Reply-To: References: Message-ID: On Sun, 28 Dec 2025 07:48:22 GMT, Sergey Bylokhov wrote: > The copyright year in jdk.internal.vm.ci files updated in 2025 has been bumped to 2025. > > The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: > > `git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done` Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/29005#pullrequestreview-3619994947 From erfang at openjdk.org Wed Dec 31 05:02:51 2025 From: erfang at openjdk.org (Eric Fang) Date: Wed, 31 Dec 2025 05:02:51 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v7] In-Reply-To: References: Message-ID: > `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. > > This PR does the following optimizations: > 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. > 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as th... Eric Fang has updated the pull request incrementally with one additional commit since the last revision: Convert the check condition for vector length into an assertion Also refined the tests. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28313/files - new: https://git.openjdk.org/jdk/pull/28313/files/2ce36c8d..8c29c902 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=05-06 Stats: 92 lines in 2 files changed: 13 ins; 5 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/28313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28313/head:pull/28313 PR: https://git.openjdk.org/jdk/pull/28313 From erfang at openjdk.org Wed Dec 31 05:05:52 2025 From: erfang at openjdk.org (Eric Fang) Date: Wed, 31 Dec 2025 05:05:52 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v6] In-Reply-To: References: Message-ID: On Fri, 26 Dec 2025 11:10:07 GMT, Emanuel Peter wrote: > I'll review this again in early January, once I'm back from Christnas/New Year break ;) Cool, no hurry, thanks for your review and happy Christmas/New Year! @eme64 @jatin-bhateja @XiaohongGong ------------- PR Comment: https://git.openjdk.org/jdk/pull/28313#issuecomment-3701452219 From serb at openjdk.org Wed Dec 31 07:26:03 2025 From: serb at openjdk.org (Sergey Bylokhov) Date: Wed, 31 Dec 2025 07:26:03 GMT Subject: Integrated: 8374378: Update copyright year to 2025 for jdk.internal.vm.ci in files where it was missed In-Reply-To: References: Message-ID: On Sun, 28 Dec 2025 07:48:22 GMT, Sergey Bylokhov wrote: > The copyright year in jdk.internal.vm.ci files updated in 2025 has been bumped to 2025. > > The next command can be run (on top of this PR) to verify that each file had prior commits in 2025: > > `git diff HEAD~1 --name-only | while read f; do git log HEAD~1 --since="2025-01-01" --oneline -- "$f" | head -1 | grep -q . || echo "NOT IN 2025: $f"; done` This pull request has now been integrated. Changeset: 3fd7bde3 Author: Sergey Bylokhov URL: https://git.openjdk.org/jdk/commit/3fd7bde31b965e027df423b3c2b5e1f360397195 Stats: 23 lines in 23 files changed: 0 ins; 0 del; 23 mod 8374378: Update copyright year to 2025 for jdk.internal.vm.ci in files where it was missed Reviewed-by: phh ------------- PR: https://git.openjdk.org/jdk/pull/29005