From duke at openjdk.org Mon Dec 1 00:38:50 2025 From: duke at openjdk.org (Shawn M Emery) Date: Mon, 1 Dec 2025 00:38:50 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v8] In-Reply-To: References: <2HwG7uFrqW7pXzu32WvTuOZmzolIhPS8TxoZazYsvG8=.a75ab9bf-8587-4e35-82a2-88b7e8aa44da@github.com> Message-ID: On Mon, 24 Nov 2025 17:24:50 GMT, Sandhya Viswanathan wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed the ENCRYPT_16_BLKS fall through case that sviswa7 pointed out in PR review. > > Marked as reviewed by sviswanathan (Reviewer). @sviswa7 or @shipilev, if the updated changes look good to you then could you please reapprove/approve the PR as I don't have Reviewer privileges at this point. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28363#issuecomment-3594057961 From vlivanov at openjdk.org Mon Dec 1 03:00:58 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 1 Dec 2025 03:00:58 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v23] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 15:47:54 GMT, Jatin Bhateja wrote: >> Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. >> >> With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. >> >> All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. >> >> Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. >> >> The patch shows around 5-20% improvement in code size by facilitating NDD demotion. >> >> For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. >> >> **Micro:-** >> image >> >> >> **Baseline :-** >> image >> >> **With opt:-** >> image >> >> Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 > - Incorporating polished comments suggestions from Daniel > - Review comments resolution > - Review comments resolutions > - Review comments resolution > - Extending biasing heuristics to account for bias range with minimum degree of freedom. Review feedback incorporated. > - Generic operand traversal and sharpening candidate selection based on RegisterMask and non-interference. Review feedback incorporated > - Review comments resolution > - Review comments resolutions > - Moving demotion candidate marking to AD file, review comments resolutions > - ... and 11 more: https://git.openjdk.org/jdk/compare/1ce2a44e...93577b83 Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/26283#pullrequestreview-3523000265 From jbhateja at openjdk.org Mon Dec 1 06:07:13 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 06:07:13 GMT Subject: RFR: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions [v23] In-Reply-To: References: Message-ID: <8ekvtuOrgyG4hIZBv08MnyISgdxUyLoNY7VOppgnkHA=.bc64a9a6-4781-48ae-81e0-5e04402be73b@github.com> On Mon, 1 Dec 2025 02:57:51 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8351016 >> - Incorporating polished comments suggestions from Daniel >> - Review comments resolution >> - Review comments resolutions >> - Review comments resolution >> - Extending biasing heuristics to account for bias range with minimum degree of freedom. Review feedback incorporated. >> - Generic operand traversal and sharpening candidate selection based on RegisterMask and non-interference. Review feedback incorporated >> - Review comments resolution >> - Review comments resolutions >> - Moving demotion candidate marking to AD file, review comments resolutions >> - ... and 11 more: https://git.openjdk.org/jdk/compare/1ce2a44e...93577b83 > > Marked as reviewed by vlivanov (Reviewer). Thanks @iwanowww , @dean-long, @dlunde , @merykitty and @sviswa7 for your reviews and approval ------------- PR Comment: https://git.openjdk.org/jdk/pull/26283#issuecomment-3594672542 From jbhateja at openjdk.org Mon Dec 1 06:07:14 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 06:07:14 GMT Subject: Integrated: 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions In-Reply-To: References: Message-ID: On Mon, 14 Jul 2025 02:36:24 GMT, Jatin Bhateja wrote: > Currently, while choosing the colour (register) for a definition live range during the select phase of register allocation, we pick the first available colour that does not match with already allocated neighboring live ranges. > > With Intel APX NDD ISA extension, several existing two-address arithmetic instructions can now have an explicit non-destructive destination operand; this, in general, saves additional spills for two-address instructions where the destination is also the first source operand, and where the source live range surpasses the current instruction. > > All NDD instructions mandate extended EVEX encoding with a bulky 4-byte prefix, [JDK-8351994](https://github.com/openjdk/jdk/pull/24431) added logic for NDD to REX/REX2 demotion in the assembler layer, but due to the existing first color selection register allocation policy, the demotions are rare. This patch biases the allocation of NDD definition to the first source operand or the second source operand for the commutative class of operations. > > Biasing is a compile-time hint to the allocator and is different from live range coalescing (aggressive/conservative), which merges the two live ranges using the union find algorithm. Given that REX encoding needs a 1-byte prefix and REX2 encoding needs a 2-byte prefix, domotion saves considerable JIT code size. > > The patch shows around 5-20% improvement in code size by facilitating NDD demotion. > > For the following micro, the method JIT code size reduced from 136 to 120 bytes, which is around a 13% reduction in code size footprint. > > **Micro:-** > image > > > **Baseline :-** > image > > **With opt:-** > image > > Thorough validations are underway using the latest [Intel Software Development Emulator version 9.58](https://www.intel.com/content/www/us/en/download/684897/intel-software-development-emulator.html). > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: e0311ecb Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/e0311ecb85b78b6d97387c17102a8b6759eefc36 Stats: 283 lines in 13 files changed: 205 ins; 15 del; 63 mod 8351016: RA support for EVEX to REX/REX2 demotion to optimize NDD instructions Reviewed-by: sviswanathan, dlunden, vlivanov, qamai ------------- PR: https://git.openjdk.org/jdk/pull/26283 From epeter at openjdk.org Mon Dec 1 06:45:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 1 Dec 2025 06:45:06 GMT Subject: RFR: 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 In-Reply-To: <3861jYG-DmA_XdGaE8Zj5GlutdxiXc4t-jjRnixANF4=.7119ba1e-8af8-4aa6-9bb2-52c92f1b3d91@github.com> References: <3861jYG-DmA_XdGaE8Zj5GlutdxiXc4t-jjRnixANF4=.7119ba1e-8af8-4aa6-9bb2-52c92f1b3d91@github.com> Message-ID: On Fri, 28 Nov 2025 10:12:19 GMT, Matthias Baesken wrote: >>> I leave it up to you if you want to file an RFE for the error message. I don't have the expertise on Windows nor on GC. >> >> @xmas92 , @jsikstro what do you think ? >> Is this about the 'ZGC requires Windows version 1803 or later' message that surprised us a little bit because we see it on Windows server 2016 , but the 1803 looks like it refers to some update of good old Win 10 . > >> @MBaesken 1803 seems to refer to both a Windows 10 and Windows Server 2016 (internal) release number/version. Here's a version list of the old semi-annual releases of Windows Server 2016: https://en.wikipedia.org/wiki/Windows_Server#Semi-Annual_releases_(discontinued) > > Thanks ! > The wikipedia says 'semi-annual releases do not include any desktop environments. Instead, they are restricted to the Nano Server configuration installed in a [Docker](https://en.wikipedia.org/wiki/Docker_(software)) [container](https://en.wikipedia.org/wiki/Containerization_(computing)),[[17]](https://en.wikipedia.org/wiki/Windows_Server#cite_note-thomasmaurer-17)[[29]](https://en.wikipedia.org/wiki/Windows_Server#cite_note-:0-29) and the Server Core configuration, licensed only to serve as a container host' so this sounds like it is a rather special 'flavor' of Win Server 2016 . > So maybe it is no wonder what we get the warning and have no VirtualAlloc2 on our Win Server 2016 test machine. @MBaesken @chhagedorn Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28537#issuecomment-3594810088 From epeter at openjdk.org Mon Dec 1 06:45:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 1 Dec 2025 06:45:07 GMT Subject: Integrated: 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 13:53:12 GMT, Emanuel Peter wrote: > @MBaesken Reported this issue on windows: > > TestAliasingCheckPreLimitNotAvailable_all-flags-fixed-stress-seed.jtr and TestAliasingCheckPreLimitNotAvailable_all-flags-no-stress-seed.jtr show failures on WIndows : > > [0.095s][error][gc] Failed to lookup symbol: VirtualAlloc2 > Error occurred during initialization of VM > ZGC requires Windows version 1803 or later > > AIX fails too : > Error occurred during initialization of VM > Option -XX:+UseZGC not supported > > > I learned a small lesson here: `@requires vm.gc.Z` is much smarter than checking that no other GC is set, or ZGC is set. It also checks if ZGC is available, which is not always the case, e.g. on the reported Windows machne. > > @MBaesken Can you please confirm that this fixes the test for you? This pull request has now been integrated. Changeset: 81b26ba8 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/81b26ba8131b74a7bb4309bd3608dda2ba99a6ca Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8372685: C2 SuperWord: wrong requires in test after JDK-8371146 Reviewed-by: chagedorn, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/28537 From epeter at openjdk.org Mon Dec 1 06:58:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 1 Dec 2025 06:58:03 GMT Subject: RFR: 8370922: Template Framework Library: Float16 type and operations Message-ID: We should test `Float16` with Template Framework Tests. For this, I'm now implementing: - Template Framework Library: add `Float16Type` that represents `Float16`. Extend `Operations.java` with `Float16` operations. - `Verify.java`: add verification for `Float16`, and corresponding tests in `TestVerifyIncubatorVector.java`. We could have done this separately, but it is not much code and completes the pipeline from code generation through execution and finally result verification in the following two tests. - Adding `Float16` to `ExpressionFuzzer.java` and `TestExpressions.java`. ------------- Commit messages: - add more flags again - add module to compilation - Merge branch 'master' into JDK-8370922-TemplateFramework-Library-Float16 - remove old TODOs - add Float16 to ExpressionFuzzer.java - fix jtreg commands - remove some unnecessary incubator flags - comparisons - rest of Float16 operators - verify for Float16 - ... and 4 more: https://git.openjdk.org/jdk/compare/08c16c38...c87acd90 Changes: https://git.openjdk.org/jdk/pull/28095/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28095&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370922 Stats: 376 lines in 9 files changed: 348 ins; 4 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/28095.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28095/head:pull/28095 PR: https://git.openjdk.org/jdk/pull/28095 From chagedorn at openjdk.org Mon Dec 1 07:09:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 1 Dec 2025 07:09:59 GMT Subject: Integrated: 8372461: [IR Framework] Multiple test failures after JDK-8371789 In-Reply-To: References: Message-ID: <2jMJC3vg0OAMgcoErG97jNFQb6Egsi352SstiE7KVKM=.42c4dc94-db04-4a06-b952-dda807762703@github.com> On Tue, 25 Nov 2025 16:51:39 GMT, Christian Hagedorn wrote: > [JDK-8371789](https://bugs.openjdk.org/browse/JDK-8371789) improved the C2 type dumps but unfortunately also broke some IR Framework internal tests and some regexes: > > - `TestIRMatching.java`: Forgot to update old reference to "precise". Replaced with "Constant". > - `IRNode.CHECKCAST_ARRAY*`: Forgot to update old reference to "precise". Replaced with `Constant` and added `aryklassptr`. > - Some clean-up to `LOAD_STORE_PREFIX` was incorrect since we no longer match various combinations tested with `TestIRMatching.java` and `TestPhaseIRMatching.java`. For example: > https://github.com/openjdk/jdk/blob/67ef81eb78b28e5dcdf91785b476dfd0858cbd16/test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java#L766-L783 > I reverted the no-longer matching part of the regex back to what we had before JDK-8371789. > > #### Testing > - [X] Tier1 > - [X] Tier5 with IR framework internal tests only > - [ ] Failing IR framework internal tests on all platforms > > Thanks, > Christian This pull request has now been integrated. Changeset: 293fec7e Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/293fec7e28ed06f0942e94b1c21affdf6aabe9ca Stats: 8 lines in 2 files changed: 0 ins; 1 del; 7 mod 8372461: [IR Framework] Multiple test failures after JDK-8371789 Reviewed-by: epeter, syan, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/28495 From mhaessig at openjdk.org Mon Dec 1 07:13:53 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 1 Dec 2025 07:13:53 GMT Subject: RFR: 8364766: C2: Improve Value() of DivI and DivL for non-constant inputs [v11] In-Reply-To: References: Message-ID: On Fri, 14 Nov 2025 21:50:27 GMT, Tobias Hotz wrote: >> This PR improves the value of interger division nodes. >> Currently, we only emit a good type if either input is constant. But we can also cover the generic case. It does that by finding the four corners of the division. This is guranteed to find the extrema that we can use for min/max. Some special logic is required for MIN_INT / -1, though, as this is a special case >> We also need some special logic to handle ranges that cross zero, but in this case, we just need to check for the negative and positive range once. >> This also cleans up and unifies the code paths for DivINode and DivLNode. >> I've added some tests to validate the optimization. Without the changes, some of these tests fail. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Simplify test, add temporary @IR rule for testLongRange and improve comments Testing passed up to tier7 ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/26143#pullrequestreview-3523457833 From chagedorn at openjdk.org Mon Dec 1 07:13:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 1 Dec 2025 07:13:59 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v2] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: On Wed, 26 Nov 2025 06:31:22 GMT, Dean Long wrote: >> The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. >> >> I played around with other approaches, like: >> 1. setting the stack to empty >> 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() >> 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set >> but in the end I decided to go with the minimal localized low-risk change. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > remove extra spaces Looks good to me, too. > always setting the reexecute flag in add_safepoint_edges() if must_throw is set but in the end I decided to go with the minimal localized low-risk change. Is this something we should follow up with? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28486#pullrequestreview-3523457969 From jbhateja at openjdk.org Mon Dec 1 07:15:41 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 07:15:41 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v4] In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding a testpoint ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28533/files - new: https://git.openjdk.org/jdk/pull/28533/files/2c08c7db..be24b1af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=02-03 Stats: 54 lines in 1 file changed: 54 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From chagedorn at openjdk.org Mon Dec 1 07:18:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 1 Dec 2025 07:18:53 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: References: Message-ID: <5-3zAbx-Kx46DdjWSHSR5h34DzqC79DdXJKAb8haPKk=.4bbe4c2c-097b-4e8d-9d3c-b85d2048416d@github.com> On Fri, 28 Nov 2025 10:50:44 GMT, Roland Westrelin wrote: > Crash occurs because a `MergeMem` node references itself: > > > 608 MergeMem === _ 1 608 1 1 1 1 1 1 1 1 1 1 878 [[ 877 878 608 420 597 ]] { - - - - - - - - - - N878:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !orig=[524] !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > ``` > > Before IGVN, that part of the stream is: > > > 522 Region === 522 604 521 [[ 522 538 523 524 525 526 527 528 529 530 531 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > 524 Phi === 522 608 464 [[ 588 581 564 546 564 559 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > > 538 If === 522 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 553 547 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 540 IfFalse === 538 [[ 548 546 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 553 If === 539 535 [[ 554 555 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 554 IfTrue === 553 [[ 562 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 555 IfFalse === 553 [[ 548 559 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > > 548 Region === 548 _ 540 555 [[ 548 562 561 563 564 565 566 567 568 569 570 571 572 573 574 575 576 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:88 (line 60) > 564 Phi === 548 _ 524 524 [[ 581 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:85 (line 61) > > 562 Region === 562 548 554 [[ 562 600 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 596 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > 581 Phi === 562 564 524 [[ 420 597 610 608 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > > 608 MergeMem === _ 1 581 1 1 1 1 1 1 1 1 1 1 588 [[ 524 ]] { - - - - - - - - - - N588:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > > > 522 is a loop head, 604 is the backedge. The loop becomes unreachable > during IGVN. The loop body above is transformed to: > > > 538 If === 604 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 562 547 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (l... That looks good to me. I'll submit some testing. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28554#pullrequestreview-3523481909 From chagedorn at openjdk.org Mon Dec 1 07:19:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 1 Dec 2025 07:19:58 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v4] In-Reply-To: <5P58y7mFExd-rdT_nGu_Ky0UG-vDGPRG2IycLX6xwIY=.403c2f90-1ab3-4096-80a7-b80d819d3ca9@github.com> References: <5P58y7mFExd-rdT_nGu_Ky0UG-vDGPRG2IycLX6xwIY=.403c2f90-1ab3-4096-80a7-b80d819d3ca9@github.com> Message-ID: On Fri, 28 Nov 2025 09:40:25 GMT, Galder Zamarre?o wrote: >> Trivial cleanup to move tests out of a test class whose description does not match these tests > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/gcbarriers/TestMinMaxLongLoopBarrier.java > > Co-authored-by: Emanuel Peter Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28385#pullrequestreview-3523489723 From shade at openjdk.org Mon Dec 1 07:54:03 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 07:54:03 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v10] In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 06:01:26 GMT, Jiangli Zhou wrote: >> Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. >> >> Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Change to break before operators. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28363#pullrequestreview-3523605119 From shade at openjdk.org Mon Dec 1 07:59:47 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 07:59:47 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: References: Message-ID: <5g0wstbmWC5gv_OG3sTv5Lb0eYCR4Cq3zQb1PJiWA6w=.efbee9c2-a9db-428b-8aa7-1c3d198d05e9@github.com> On Fri, 28 Nov 2025 10:50:44 GMT, Roland Westrelin wrote: > Crash occurs because a `MergeMem` node references itself: > > > 608 MergeMem === _ 1 608 1 1 1 1 1 1 1 1 1 1 878 [[ 877 878 608 420 597 ]] { - - - - - - - - - - N878:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !orig=[524] !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > ``` > > Before IGVN, that part of the stream is: > > > 522 Region === 522 604 521 [[ 522 538 523 524 525 526 527 528 529 530 531 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > 524 Phi === 522 608 464 [[ 588 581 564 546 564 559 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > > 538 If === 522 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 553 547 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 540 IfFalse === 538 [[ 548 546 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 553 If === 539 535 [[ 554 555 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 554 IfTrue === 553 [[ 562 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 555 IfFalse === 553 [[ 548 559 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > > 548 Region === 548 _ 540 555 [[ 548 562 561 563 564 565 566 567 568 569 570 571 572 573 574 575 576 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:88 (line 60) > 564 Phi === 548 _ 524 524 [[ 581 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:85 (line 61) > > 562 Region === 562 548 554 [[ 562 600 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 596 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > 581 Phi === 562 564 524 [[ 420 597 610 608 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > > 608 MergeMem === _ 1 581 1 1 1 1 1 1 1 1 1 1 588 [[ 524 ]] { - - - - - - - - - - N588:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > > > 522 is a loop head, 604 is the backedge. The loop becomes unreachable > during IGVN. The loop body above is transformed to: > > > 538 If === 604 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 562 547 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (l... GHA failures in [com/sun/crypto/provider/Cipher/HPKE/KAT9180](https://github.com/rwestrel/jdk/actions/runs/19761317022#user-content-com_sun_crypto_provider_cipher_hpke_kat9180) would disappear if you merge from master. Actually, this might mean the PR base is quite old, and there might be other bugs on the intersection with this one. Merge from master and pass the GHA, maybe? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28554#issuecomment-3595112399 From dfenacci at openjdk.org Mon Dec 1 08:23:56 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 1 Dec 2025 08:23:56 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 10:50:44 GMT, Roland Westrelin wrote: > Crash occurs because a `MergeMem` node references itself: > > > 608 MergeMem === _ 1 608 1 1 1 1 1 1 1 1 1 1 878 [[ 877 878 608 420 597 ]] { - - - - - - - - - - N878:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !orig=[524] !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > ``` > > Before IGVN, that part of the stream is: > > > 522 Region === 522 604 521 [[ 522 538 523 524 525 526 527 528 529 530 531 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > 524 Phi === 522 608 464 [[ 588 581 564 546 564 559 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > > 538 If === 522 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 553 547 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 540 IfFalse === 538 [[ 548 546 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 553 If === 539 535 [[ 554 555 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 554 IfTrue === 553 [[ 562 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 555 IfFalse === 553 [[ 548 559 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > > 548 Region === 548 _ 540 555 [[ 548 562 561 563 564 565 566 567 568 569 570 571 572 573 574 575 576 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:88 (line 60) > 564 Phi === 548 _ 524 524 [[ 581 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:85 (line 61) > > 562 Region === 562 548 554 [[ 562 600 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 596 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > 581 Phi === 562 564 524 [[ 420 597 610 608 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > > 608 MergeMem === _ 1 581 1 1 1 1 1 1 1 1 1 1 588 [[ 524 ]] { - - - - - - - - - - N588:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > > > 522 is a loop head, 604 is the backedge. The loop becomes unreachable > during IGVN. The loop body above is transformed to: > > > 538 If === 604 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 562 547 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (l... Thanks for fixing this @rwestrel! Barring Christian's testing, the change looks good to me. src/hotspot/share/opto/cfgnode.cpp line 1404: > 1402: Node* other_phi_input = in(j); > 1403: if (other_phi_input != nullptr && other_phi_input == merge_mem->base_memory() && !is_data_loop(region, phi_input, igvn)) { > 1404: // merge_mem is a successor memory to other_phi_input, and is not pinned inside the diamond, so push it out. Do you think it might be worth adding an additional reason for `!is_data_loop` in the comment? ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/28554#pullrequestreview-3523662101 PR Review Comment: https://git.openjdk.org/jdk/pull/28554#discussion_r2576027929 From lucy at openjdk.org Mon Dec 1 08:31:27 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 1 Dec 2025 08:31:27 GMT Subject: RFR: 8372730: Problem list compiler/arguments/TestCodeEntryAlignment.java on x64 In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 10:33:24 GMT, Matthias Baesken wrote: > [JDK-8372720](https://bugs.openjdk.org/browse/JDK-8372720) problem listed the test compiler/arguments/TestCodeEntryAlignment.java on macOS x64 but the issue appears on other OS running on x64 CPUs (e.g. Linux) too . LGTM ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28553#pullrequestreview-3523762955 From shade at openjdk.org Mon Dec 1 08:44:06 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 08:44:06 GMT Subject: Integrated: 8372188: AArch64: Generate atomic match rules from M4 stencils In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 14:54:06 GMT, Aleksey Shipilev wrote: > Current atomic match rules are all over the place in AArch64: > - CAE and weak CAS rules are generated with the help of `cas.m4`, and then are supposed to be copy-pasted (?) into `aarch64.ad`. I did it about 20 times when fixing [JDK-8372154](https://bugs.openjdk.org/browse/JDK-8372154), gets tedious very quickly. > - Strong CAS and get-and-set rules are still in the same section of `aarch64.ad`, and are written by hand. Yet, those can be automatically generated from M4 stencils as well. > > This PR cleans that up by moving all these rules into a separate `.ad` file, which one can cleanly re-generate by invoking `m4 aarch64_atomic_ad.m4 > aarch64_atomic.ad`. The meat of the change is `aarch64_atomic.m4`, everything else is either generated from it, or removed in favor of auto-generated code. There should be no semantic change, as I attempted to move the rules mostly verbatim, only changing non-semantic stuff like match rule names and some formats. > > Testing: > - [x] Eyeballing match rules before/after > - [x] Linux AArch64 server fastdebug, `hotspot_compiler` > - [x] Linux AArch64 server fastdebug, `tier1` > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, jcstress run This pull request has now been integrated. Changeset: 3481252c Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/3481252ced7c06c44154ceccc56b12cfd9a490c3 Stats: 2349 lines in 5 files changed: 1156 ins; 1193 del; 0 mod 8372188: AArch64: Generate atomic match rules from M4 stencils Reviewed-by: aph, haosun ------------- PR: https://git.openjdk.org/jdk/pull/28538 From shade at openjdk.org Mon Dec 1 08:44:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 08:44:04 GMT Subject: RFR: 8372188: AArch64: Generate atomic match rules from M4 stencils In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 14:54:06 GMT, Aleksey Shipilev wrote: > Current atomic match rules are all over the place in AArch64: > - CAE and weak CAS rules are generated with the help of `cas.m4`, and then are supposed to be copy-pasted (?) into `aarch64.ad`. I did it about 20 times when fixing [JDK-8372154](https://bugs.openjdk.org/browse/JDK-8372154), gets tedious very quickly. > - Strong CAS and get-and-set rules are still in the same section of `aarch64.ad`, and are written by hand. Yet, those can be automatically generated from M4 stencils as well. > > This PR cleans that up by moving all these rules into a separate `.ad` file, which one can cleanly re-generate by invoking `m4 aarch64_atomic_ad.m4 > aarch64_atomic.ad`. The meat of the change is `aarch64_atomic.m4`, everything else is either generated from it, or removed in favor of auto-generated code. There should be no semantic change, as I attempted to move the rules mostly verbatim, only changing non-semantic stuff like match rule names and some formats. > > Testing: > - [x] Eyeballing match rules before/after > - [x] Linux AArch64 server fastdebug, `hotspot_compiler` > - [x] Linux AArch64 server fastdebug, `tier1` > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, jcstress run Over the weekend (60+ hours) jcstress run comes clean, apart from errant `MaxVectorSize` asserts unrelated to this patch. So I am integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28538#issuecomment-3595289184 From jbhateja at openjdk.org Mon Dec 1 08:46:48 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 08:46:48 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v3] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> <5bV8t0Bo16-WVON8_AJLfcPDDqWVHDxIjmdGPPNazE8=.51d5a17d-1b87-44d4-ad41-e9d346e6b9f7@github.com> Message-ID: On Thu, 27 Nov 2025 16:12:59 GMT, Emanuel Peter wrote: > Ok, that's fine with me too. > > It would be nice if you could also attach a regression test, or maybe add an additional run to the existing test, with the required flags for reproducing this issue. @eme64 addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28533#issuecomment-3595303558 From goetz at openjdk.org Mon Dec 1 08:56:06 2025 From: goetz at openjdk.org (Goetz Lindenmaier) Date: Mon, 1 Dec 2025 08:56:06 GMT Subject: RFR: 8372730: Problem list compiler/arguments/TestCodeEntryAlignment.java on x64 In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 10:33:24 GMT, Matthias Baesken wrote: > [JDK-8372720](https://bugs.openjdk.org/browse/JDK-8372720) problem listed the test compiler/arguments/TestCodeEntryAlignment.java on macOS x64 but the issue appears on other OS running on x64 CPUs (e.g. Linux) too . Marked as reviewed by goetz (Reviewer). LGTM, too. ------------- PR Review: https://git.openjdk.org/jdk/pull/28553#pullrequestreview-3523862225 PR Comment: https://git.openjdk.org/jdk/pull/28553#issuecomment-3595336998 From mbaesken at openjdk.org Mon Dec 1 09:06:23 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 1 Dec 2025 09:06:23 GMT Subject: RFR: 8372730: Problem list compiler/arguments/TestCodeEntryAlignment.java on x64 In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 10:33:24 GMT, Matthias Baesken wrote: > [JDK-8372720](https://bugs.openjdk.org/browse/JDK-8372720) problem listed the test compiler/arguments/TestCodeEntryAlignment.java on macOS x64 but the issue appears on other OS running on x64 CPUs (e.g. Linux) too . Thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28553#issuecomment-3595376162 From mbaesken at openjdk.org Mon Dec 1 09:06:24 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 1 Dec 2025 09:06:24 GMT Subject: Integrated: 8372730: Problem list compiler/arguments/TestCodeEntryAlignment.java on x64 In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 10:33:24 GMT, Matthias Baesken wrote: > [JDK-8372720](https://bugs.openjdk.org/browse/JDK-8372720) problem listed the test compiler/arguments/TestCodeEntryAlignment.java on macOS x64 but the issue appears on other OS running on x64 CPUs (e.g. Linux) too . This pull request has now been integrated. Changeset: 5bd7db03 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/5bd7db034aaf8aa6780945e02a7f9a35e16b036e Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8372730: Problem list compiler/arguments/TestCodeEntryAlignment.java on x64 Reviewed-by: lucy, goetz ------------- PR: https://git.openjdk.org/jdk/pull/28553 From shade at openjdk.org Mon Dec 1 09:09:23 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 09:09:23 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v5] In-Reply-To: References: Message-ID: <_o4EmLJi9oI38vVDE69u9u8dC-Ad8501si7GW_bIi9M=.ec2c10fb-40fd-4c8e-95af-ae8f59806da6@github.com> On Fri, 28 Nov 2025 15:21:47 GMT, Andrew Haley wrote: > I'm seeing minor performance regressions in `InterfaceCalls.test2ndInt5Types`, before and after this PR: Reproduced locally too: Benchmark (randomized) Mode Cnt Score Error Units # Baseline InterfaceCalls.test2ndInt5Types false avgt 12 16.945 ? 0.079 ns/op InterfaceCalls.test2ndInt5Types:L1-dcache-load-misses false avgt 3 0.076 ? 2.187 #/op InterfaceCalls.test2ndInt5Types:L1-dcache-loads false avgt 3 88.738 ? 0.416 #/op InterfaceCalls.test2ndInt5Types:branch-misses false avgt 3 0.007 ? 0.003 #/op InterfaceCalls.test2ndInt5Types:branches false avgt 3 49.122 ? 0.353 #/op InterfaceCalls.test2ndInt5Types:cycles false avgt 3 57.147 ? 1.698 #/op InterfaceCalls.test2ndInt5Types:instructions false avgt 3 247.443 ? 1.531 #/op # Current PR InterfaceCalls.test2ndInt5Types false avgt 12 22.513 ? 0.208 ns/op InterfaceCalls.test2ndInt5Types:L1-dcache-load-misses false avgt 3 0.012 ? 0.072 #/op InterfaceCalls.test2ndInt5Types:L1-dcache-loads false avgt 3 108.446 ? 13.975 #/op ; +20 loads InterfaceCalls.test2ndInt5Types:branch-misses false avgt 3 0.407 ? 0.010 #/op InterfaceCalls.test2ndInt5Types:branches false avgt 3 54.102 ? 0.403 #/op ; +5 branches InterfaceCalls.test2ndInt5Types:cycles false avgt 3 75.938 ? 5.043 #/op InterfaceCalls.test2ndInt5Types:instructions false avgt 3 280.194 ? 5.758 #/op ; +32 instructions Looked at perfasm, and there are no gross problems there. I also think reliability trumps this minor performance bump. But I also suspect this is caused by second loop re-walking the table looking for (empty) slots, this is where extra loads are coming from. I believe it can reasonably track the first non-null slot and start the walk from there. Let me see if it is simple to do without complicating the code all too much. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3595393093 From epeter at openjdk.org Mon Dec 1 09:23:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 1 Dec 2025 09:23:46 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v4] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Mon, 1 Dec 2025 07:15:41 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding a testpoint test/hotspot/jtreg/compiler/vectorapi/TestABSMaskedMaxByteVector.java line 48: > 46: @Test > 47: @IR(failOn = {IRNode.ABS_VB}, applyIfAnd={"MaxVectorSize", " <= 8 ", "UseAVX", "0"}) > 48: @IR(counts = {IRNode.ABS_VB, "1"}, applyIf={"MaxVectorSize", " > 8 "}) Are you sure this is going to pass on all platforms? Does this test run ok on `aarch64` where there is no `UseAVX` flag? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28533#discussion_r2576273485 From mdoerr at openjdk.org Mon Dec 1 10:27:36 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 1 Dec 2025 10:27:36 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v7] In-Reply-To: References: Message-ID: > This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. > > The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Minor simplification. - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt - Fix missing whitespace. - Address review comments. - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt - Remove K from AES_Crypt - More minor cleanup. - Improve comment and minor cleanup. - 8371820: Further AES performance improvements for key schedule generation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28299/files - new: https://git.openjdk.org/jdk/pull/28299/files/ae84912d..c7107a70 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28299&range=05-06 Stats: 17426 lines in 467 files changed: 10903 ins; 3977 del; 2546 mod Patch: https://git.openjdk.org/jdk/pull/28299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28299/head:pull/28299 PR: https://git.openjdk.org/jdk/pull/28299 From jbhateja at openjdk.org Mon Dec 1 11:48:23 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 11:48:23 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v5] In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Limiting for x86 targets ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28533/files - new: https://git.openjdk.org/jdk/pull/28533/files/be24b1af..a0e008de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From jbhateja at openjdk.org Mon Dec 1 11:48:28 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 11:48:28 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v4] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Mon, 1 Dec 2025 09:20:58 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding a testpoint > > test/hotspot/jtreg/compiler/vectorapi/TestABSMaskedMaxByteVector.java line 48: > >> 46: @Test >> 47: @IR(failOn = {IRNode.ABS_VB}, applyIfAnd={"MaxVectorSize", " <= 8 ", "UseAVX", "0"}) >> 48: @IR(counts = {IRNode.ABS_VB, "1"}, applyIf={"MaxVectorSize", " > 8 "}) > > Are you sure this is going to pass on all platforms? Does this test run ok on `aarch64` where there is no `UseAVX` flag? I will crib on AARCH64, do you think we should make IR framework sensitive to IgnoreUnrecognizedVMOptions ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28533#discussion_r2576742922 From epeter at openjdk.org Mon Dec 1 12:12:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 1 Dec 2025 12:12:47 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v4] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Mon, 1 Dec 2025 11:45:16 GMT, Jatin Bhateja wrote: >> test/hotspot/jtreg/compiler/vectorapi/TestABSMaskedMaxByteVector.java line 48: >> >>> 46: @Test >>> 47: @IR(failOn = {IRNode.ABS_VB}, applyIfAnd={"MaxVectorSize", " <= 8 ", "UseAVX", "0"}) >>> 48: @IR(counts = {IRNode.ABS_VB, "1"}, applyIf={"MaxVectorSize", " > 8 "}) >> >> Are you sure this is going to pass on all platforms? Does this test run ok on `aarch64` where there is no `UseAVX` flag? > > Thanks, fixed > > I intent to pass UseAVX=0 as a run flag to reproduce exact bug scenario, our framework is not sensitive to IgnoreUnrecoginzedVMOptions. You could also just limit the rules to `sse4.1` platforms. Then you can run the tests everywhere, but limit IR rules to what is easy to test for you ;) Platform features get tested before flags, so that helps with platform specific flags ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28533#discussion_r2576821786 From goetz at openjdk.org Mon Dec 1 12:23:52 2025 From: goetz at openjdk.org (Goetz Lindenmaier) Date: Mon, 1 Dec 2025 12:23:52 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v4] In-Reply-To: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> References: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> Message-ID: <7h2_vOOkP-YCjBQ0dIRbNWg3o4gCjy4zwaAE62K0TkE=.c5a07e88-b55b-4be3-9e9d-7d484a663e98@github.com> On Thu, 20 Nov 2025 10:21:34 GMT, Richard Reingruber wrote: >> With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. >> >> It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. >> >> The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. >> >> The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. >> Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. >> >> So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) >> >> There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. >> >> Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. >> >> ##### Testing with fastdebug builds on AARCH64 and PPC64: >> >> hotspot_vector_1 >> hotspot_vector_2 >> jdk_vector >> jdk_vector_sanity >> >> ##### The change passed our CI testing: >> Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: >> >> compiler/vectorapi/VectorRearrangeTest.java >> jdk/incubator/vector/Byte128VectorLoadStoreTests.java >> jdk/incubator/vector/Double256VectorLoadStoreTests.java >> jdk/incubator/vector/Float128VectorTests.java >> jdk/incubator/vector/Long256VectorLoadStoreTests.java >> jdk/incubator/vector/Short128VectorLoadStoreTests.java >> jdk/incubator/vector/Vector64ConversionTests.java > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' > - Exclude IR check on riscv with rvv > - Enhance comment > - Fix OptoAssembly for Power 8 > - PPC: OptoAssembly for vector spilling > - Assert aligned sp offsets in vector spilling > - Delete TMP and !UseNewCode > - Align Matcher::_new_SP for better vector spilling > - TMP: trace unaligned vector spilling > - Add test LGTM OK, so it's not the frame layout aspect of mapping slots to adresses that is adapted by your change, but only the new_sp. Before, the "unusd" part was in the new frame, now it is in the old one or rather completely omitted. The growth of the stack is not altered. So the change has no mem space side effect and thus is not critical to apply to all platforms. Thanks for the clarification! src/hotspot/share/opto/chaitin.hpp line 146: > 144: private: > 145: // Number of registers this live range uses when it colors > 146: uint16_t _num_regs; // byte size of the value divided by slot size which is 4 Is this true for oops, too? Hadn't they been mapped to one slot on both, 32 and 64-bit platforms? ------------- Marked as reviewed by goetz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/27969#pullrequestreview-3487238248 PR Comment: https://git.openjdk.org/jdk/pull/27969#issuecomment-3596247610 PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2545571373 From goetz at openjdk.org Mon Dec 1 12:26:55 2025 From: goetz at openjdk.org (Goetz Lindenmaier) Date: Mon, 1 Dec 2025 12:26:55 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v4] In-Reply-To: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> References: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> Message-ID: On Thu, 20 Nov 2025 10:21:34 GMT, Richard Reingruber wrote: >> With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. >> >> It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. >> >> The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. >> >> The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. >> Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. >> >> So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) >> >> There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. >> >> Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. >> >> ##### Testing with fastdebug builds on AARCH64 and PPC64: >> >> hotspot_vector_1 >> hotspot_vector_2 >> jdk_vector >> jdk_vector_sanity >> >> ##### The change passed our CI testing: >> Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: >> >> compiler/vectorapi/VectorRearrangeTest.java >> jdk/incubator/vector/Byte128VectorLoadStoreTests.java >> jdk/incubator/vector/Double256VectorLoadStoreTests.java >> jdk/incubator/vector/Float128VectorTests.java >> jdk/incubator/vector/Long256VectorLoadStoreTests.java >> jdk/incubator/vector/Short128VectorLoadStoreTests.java >> jdk/incubator/vector/Vector64ConversionTests.java > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' > - Exclude IR check on riscv with rvv > - Enhance comment > - Fix OptoAssembly for Power 8 > - PPC: OptoAssembly for vector spilling > - Assert aligned sp offsets in vector spilling > - Delete TMP and !UseNewCode > - Align Matcher::_new_SP for better vector spilling > - TMP: trace unaligned vector spilling > - Add test So it's not the Spill slots that are better aligned, as the title proposes. It's just the offsets to the new_sp that has better alignment und thus can be encoded cheaper. Maybe change the title to "C2: Better Aligment of Vector Spill Slot offsets"? ------------- PR Comment: https://git.openjdk.org/jdk/pull/27969#issuecomment-3596274240 From shade at openjdk.org Mon Dec 1 12:42:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 12:42:52 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 18:10:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > reviews Are we moving forward with this? Still too many failures in local testing without this fix :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3596356835 From shade at openjdk.org Mon Dec 1 13:04:08 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 13:04:08 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v6] In-Reply-To: References: Message-ID: <2ifEaoGuZU4duyckWchgOnnqfH6AgAcrqsiqBZH1Nx4=.1df7af8d-41ac-43a1-90ab-964eb80f155b@github.com> > See the bug for discussion what issues current machinery has. > > This PR executes the plan outlined in the bug: > 1. Common the receiver type profiling code in interpreter and C1 > 2. Rewrite receiver type profiling code to only do atomic receiver slot installations > 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed > > This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: - Simplify third case: no need to loop, just restart the search - Actually have a second "fast" case: receiver is not found in the table, and the table is full - Pushing/popping for rare CAS path is counter-productive ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25305/files - new: https://git.openjdk.org/jdk/pull/25305/files/c441209a..f3e0fa4d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=04-05 Stats: 157 lines in 1 file changed: 85 ins; 52 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/25305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305 PR: https://git.openjdk.org/jdk/pull/25305 From shade at openjdk.org Mon Dec 1 13:04:10 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 Dec 2025 13:04:10 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v5] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 15:55:38 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Tighten up some more > - Offset is always rscratch1, no need to save it > - Grossly simplify register shuffling > - More asserts > - More comment touchups > - Inline code comments > - Mention the updater in ReceiverTypeData > - type_profile -> profile_receiver_type > - Stylistic: remove redundant assert > - ... and 5 more: https://git.openjdk.org/jdk/compare/c028369d...c441209a Oh, all right! This made me realize we actually have a secondary "fast" case: receiver is not found, but profile is full. This is pretty frequent with `TypeProfileWidth=2`. In that case, we are doing way too much stuff, anticipating receiver slot installation that would never actually come. Specializing for that case costs significantly fewer loads, and gets the code much more pipelined; I suspect that because tight loops that _do not_ have CAS-es in them are uop-cached more readily. We now lose "only" 0.5ns in this test: Benchmark (randomized) Mode Cnt Score Error Units # Baseline InterfaceCalls.test2ndInt5Types false avgt 12 16.945 ? 0.079 ns/op InterfaceCalls.test2ndInt5Types:L1-dcache-load-misses false avgt 3 0.076 ? 2.187 #/op InterfaceCalls.test2ndInt5Types:L1-dcache-loads false avgt 3 88.738 ? 0.416 #/op InterfaceCalls.test2ndInt5Types:branch-misses false avgt 3 0.007 ? 0.003 #/op InterfaceCalls.test2ndInt5Types:branches false avgt 3 49.122 ? 0.353 #/op InterfaceCalls.test2ndInt5Types:cycles false avgt 3 57.147 ? 1.698 #/op InterfaceCalls.test2ndInt5Types:instructions false avgt 3 247.443 ? 1.531 #/op # Old PR version InterfaceCalls.test2ndInt5Types false avgt 12 22.513 ? 0.208 ns/op InterfaceCalls.test2ndInt5Types:L1-dcache-load-misses false avgt 3 0.012 ? 0.072 #/op InterfaceCalls.test2ndInt5Types:L1-dcache-loads false avgt 3 108.446 ? 13.975 #/op ; +20 loads InterfaceCalls.test2ndInt5Types:branch-misses false avgt 3 0.407 ? 0.010 #/op InterfaceCalls.test2ndInt5Types:branches false avgt 3 54.102 ? 0.403 #/op ; +5 branches InterfaceCalls.test2ndInt5Types:cycles false avgt 3 75.938 ? 5.043 #/op ; +19 cycles InterfaceCalls.test2ndInt5Types:instructions false avgt 3 280.194 ? 5.758 #/op ; +32 instructions # New PR version InterfaceCalls.test2ndInt5Types false avgt 12 17.441 ? 0.287 ns/op InterfaceCalls.test2ndInt5Types:L1-dcache-load-misses false avgt 3 0.009 ? 0.072 #/op InterfaceCalls.test2ndInt5Types:L1-dcache-loads false avgt 3 88.803 ? 1.401 #/op InterfaceCalls.test2ndInt5Types:branch-misses false avgt 3 0.009 ? 0.062 #/op InterfaceCalls.test2ndInt5Types:branches false avgt 3 52.945 ? 0.752 #/op ; +4 branches InterfaceCalls.test2ndInt5Types:cycles false avgt 3 58.866 ? 15.379 #/op ; +2 cycles InterfaceCalls.test2ndInt5Types:instructions false avgt 3 272.838 ? 1.665 #/op ; +28 instructions The code is in new commits, passes `hotspot:tier1`, running more tests now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3596428656 From jbhateja at openjdk.org Mon Dec 1 13:06:12 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 13:06:12 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v6] In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Review suggestion incorporated ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28533/files - new: https://git.openjdk.org/jdk/pull/28533/files/a0e008de..c84f473e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=04-05 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From jbhateja at openjdk.org Mon Dec 1 13:12:07 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 13:12:07 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v4] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Mon, 1 Dec 2025 12:10:15 GMT, Emanuel Peter wrote: >> Thanks, fixed >> >> I intend to pass UseAVX=0 as a run flag to reproduce exact bug scenario, our framework is not sensitive to IgnoreUnrecoginzedVMOptions. > > You could also just limit the rules to `sse4.1` platforms. Then you can run the tests everywhere, but limit IR rules to what is easy to test for you ;) > > Platform features get tested before flags, so that helps with platform specific flags ;) Thanks!! , that is much better! https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/compiler/lib/ir_framework#disableenable-ir-rules-based-on-platform ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28533#discussion_r2577010701 From jbhateja at openjdk.org Mon Dec 1 13:32:25 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 13:32:25 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v7] In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28533/files - new: https://git.openjdk.org/jdk/pull/28533/files/c84f473e..2f773133 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From jbhateja at openjdk.org Mon Dec 1 13:39:09 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Dec 2025 13:39:09 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v8] In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28533/files - new: https://git.openjdk.org/jdk/pull/28533/files/2f773133..ef84ffa7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From mli at openjdk.org Mon Dec 1 15:13:13 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 1 Dec 2025 15:13:13 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> Message-ID: <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> > Hi, > > This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. > > This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. > > Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. > > # Test > ## Jtreg > > in progress... > > ## Performance > > Column names meanings: > * p: with patch > * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > * m: without patch > * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on > > #### Average improvement > > NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. > > For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. > > Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) > -- | -- | -- | -- > 1.022782609 | 2.198717391 | 2.162673913 | 2.199 > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: - remove log_warning - add test cases: BoolTest::ge/gt in enc_cmove_fp_cmp_fp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28309/files - new: https://git.openjdk.org/jdk/pull/28309/files/46b32186..077dc35c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28309&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28309&range=05-06 Stats: 226 lines in 2 files changed: 214 ins; 2 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/28309.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28309/head:pull/28309 PR: https://git.openjdk.org/jdk/pull/28309 From mli at openjdk.org Mon Dec 1 15:13:15 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 1 Dec 2025 15:13:15 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v6] In-Reply-To: References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <7kh5C9nj7bf6432cG35kDDvV6zhnKEspe8AcYetJ1do=.e1d9ebd3-d80d-4621-8c1e-c77dc721d0df@github.com> Message-ID: On Tue, 25 Nov 2025 09:39:26 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 2141: >> >>> 2139: case BoolTest::gt: >>> 2140: cmov_fp_cmp_fp_gt(op1, op2, dst, src, cmp_single, cmov_single); >>> 2141: log_warning(jit)("Float/Double BoolTest::gt path is not tested well, please report the test case!"); >> >> My local tests show this does happen. Try this: >> `$ make test TEST="./test/jdk/javax/sound/midi/Gervill/SoftFilter/TestProcessAudio.java" TEST_VM_OPTS="-XX:-TieredCompilation"` >> >> I think this could be a good reference if you want to add some extra tests for the two cases here. > > Thanks, I'll check it later. Sorry for the delayed response. I've added the test case to cover all the code paths. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28309#discussion_r2577477492 From liach at openjdk.org Mon Dec 1 15:40:40 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 1 Dec 2025 15:40:40 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v3] In-Reply-To: References: Message-ID: > Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. > > They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. > > We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. > > Paging @minborg who requested Optional folding for review. > > I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Doc tweaks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28540/files - new: https://git.openjdk.org/jdk/pull/28540/files/712dbf1c..7a1cfa4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=01-02 Stats: 25 lines in 1 file changed: 0 ins; 24 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28540.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28540/head:pull/28540 PR: https://git.openjdk.org/jdk/pull/28540 From roland at openjdk.org Mon Dec 1 15:50:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 1 Dec 2025 15:50:41 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations Message-ID: For this failure memory stats are: Total Usage: 1095525816 --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 ctorChaitin 160032 160032 0 0 0 0 0 0 0 0 0 0 0 regAllocSplit 4189544 32728 4156816 0 0 0 0 0 0 0 0 0 0 postAllocCopyRemoval 65456 0 65456 0 0 0 0 0 0 0 0 0 0 fixupSpills 32728 0 32728 0 0 0 0 0 0 0 0 0 0 chaitinCoalesce1 1505808 262144 1243664 0 0 0 0 0 0 0 0 0 0 output 138300376 138300376 0 0 0 0 0 0 0 0 0 0 0 shorten branches 360008 196368 163640 0 0 0 0 0 0 0 0 0 0 The noticeable line is: idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 A lot of memory (almost 1 GB) gets allocated in the `comp` arena during `idealLoop`. So even though the compilation goes over the limit in `Compile::Code_Gen()`, the root cause is what happens earlier, during `idealLoop`. `_loop_or_ctrl` and `_body` are both allocated in the `comp` arena. Accumulated over several loop opts pass, they should not use that much memory but the test is run with `+VerifyLoopOptimizations`: calls to `PhaseIdealLoop::verify()` cause new `PhaseIdealLoop` objects to be allocated and more memory to be used in the `comp` arena. The fix I propose is to allocate `_loop_or_ctrl` and `_body` in a dedicated `ResourceArea` so memory can be reclaimed when a pass of loop opts is over. With that change: Total Usage: 227682272 --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 227682272 --- idealLoop 52278416 0 38687056 6913568 0 392776 0 0 0 0 0 6285016 0 0 that is ~50MB total for `idealLoop` instead of almost 1GB. Total usage peaks around 200MB. ------------- Commit messages: - whitespaces - more - test case - more - clean up - fix Changes: https://git.openjdk.org/jdk/pull/28581/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370519 Stats: 345 lines in 6 files changed: 341 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28581/head:pull/28581 PR: https://git.openjdk.org/jdk/pull/28581 From coleenp at openjdk.org Mon Dec 1 15:57:10 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 1 Dec 2025 15:57:10 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v3] In-Reply-To: References: Message-ID: <379iBIu0uk_Af-5_RZUQBFNkGyFM7iYpe4B_hg93tn8=.95e6e771-31f5-4b89-8172-aa3d0837de25@github.com> On Mon, 1 Dec 2025 15:40:40 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Doc tweaks With one small change, the runtime part of this change looks good. src/hotspot/share/ci/ciField.cpp line 220: > 218: return false; > 219: // Explicit opt-in from system classes > 220: if (holder->trust_final_fields()) This is missing { } so not sure where it ends, especially that it encloses an if statement, and other code. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28540#pullrequestreview-3525748039 PR Review Comment: https://git.openjdk.org/jdk/pull/28540#discussion_r2577662841 From mhaessig at openjdk.org Mon Dec 1 16:12:36 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 1 Dec 2025 16:12:36 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 15:40:00 GMT, Roland Westrelin wrote: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Thank you for fixing this, @rwestrel. Your fix looks good to me. I merely have two nitpicky suggestions. I will kick off a run of testing and report back with the results. src/hotspot/share/opto/compile.hpp line 374: > 372: // Compilation environment. > 373: Arena _comp_arena; // Arena with lifetime equivalent to Compile > 374: ResourceArea _idealloop_arena; // For data whose lifetime is a pass of loop optimizations Suggestion: ResourceArea _idealloop_arena; // For data whose lifetime is a single pass of loop optimizations ``` Nit: This makes it abundantly clear that the data is freed after one pass. test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java line 27: > 25: * @test > 26: * @bug 8370519 > 27: * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations Unsure, but would this test qualify for `@key stress`? ------------- PR Review: https://git.openjdk.org/jdk/pull/28581#pullrequestreview-3525793585 PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2577699466 PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2577714828 From qamai at openjdk.org Mon Dec 1 16:19:23 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 1 Dec 2025 16:19:23 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 15:40:00 GMT, Roland Westrelin wrote: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... src/hotspot/share/opto/compile.hpp line 810: > 808: // Compilation environment. > 809: Arena* comp_arena() { return &_comp_arena; } > 810: ResourceArea* idealloop_arena() { return &_idealloop_arena; } Should we make it more idiomatic C++ by having the `ResourceArea` allocated and deallocated together with the `PhaseIdealLoop` instead of attaching it to the `Compile` object? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2577746393 From sviswanathan at openjdk.org Mon Dec 1 16:33:43 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 1 Dec 2025 16:33:43 GMT Subject: RFR: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes [v10] In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 06:01:26 GMT, Jiangli Zhou wrote: >> Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. >> >> Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Change to break before operators. Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28363#pullrequestreview-3525930778 From bmaillard at openjdk.org Mon Dec 1 16:55:09 2025 From: bmaillard at openjdk.org (=?UTF-8?B?QmVub8OudA==?= Maillard) Date: Mon, 1 Dec 2025 16:55:09 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 15:40:00 GMT, Roland Westrelin wrote: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Thanks for fixing this @rwestrel, I agree with the fix. I noticed that this could be a problem while working on [JDK-8366990](https://bugs.openjdk.org/browse/JDK-8366990), but there was no reproducer at the time. src/hotspot/share/opto/compile.cpp line 656: > 654: _stress_seed(0), > 655: _comp_arena(mtCompiler, Arena::Tag::tag_comp), > 656: _idealloop_arena(mtCompiler, Arena::Tag::tag_idealloop), To keep the naming consistent with other mentions of `IdealLoop` in variable/field names (such as `_phase_verify_ideal_loop`), I would name this `_ideal_loop_arena`. This will make it easier to find in a code editor. Feel free to ignore if you disagree test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java line 28: > 26: * @bug 8370519 > 27: * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations > 28: * @run main/othervm -XX:CompileCommand=compileonly,*TestVerifyLoopOptimizationsHighMemUsage*::* -XX:-TieredCompilation -Xbatch Out of curiosity, have you try reducing the test with `creduce`? I fixed a similar issue in [JDK-8366990](https://bugs.openjdk.org/browse/JDK-8366990), and initially reviewers were concerned about the long compilation time. I was able to get decent results with `creduce` by using `-XX:CompileCommand=memlimit`. Not sure if it's worth doing here though. ------------- PR Review: https://git.openjdk.org/jdk/pull/28581#pullrequestreview-3525878832 PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2577760668 PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2577804805 From jiangli at openjdk.org Mon Dec 1 17:32:40 2025 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 1 Dec 2025 17:32:40 GMT Subject: Integrated: 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes In-Reply-To: References: Message-ID: On Mon, 17 Nov 2025 22:34:14 GMT, Jiangli Zhou wrote: > Please review the fix in StubGenerator::aesgcm_avx512 and StubGenerator::aesgcm_avx2 to handle some edge cases with input sizes that are not multiple of the block size. > > Thanks to Thomas Holenstein and Lukas Zobernig for analyzing the issue and providing the test case! This pull request has now been integrated. Changeset: 6cb1c8f9 Author: Jiangli Zhou URL: https://git.openjdk.org/jdk/commit/6cb1c8f9cfcb797af788ca8fb490f388cc68f525 Stats: 151 lines in 2 files changed: 149 ins; 1 del; 1 mod 8371864: GaloisCounterMode.implGCMCrypt0 AVX512/AVX2 intrinsics stubs cause AES-GCM encryption failure for certain payload sizes Co-authored-by: Thomas Holenstein Co-authored-by: Lukas Zobernig Reviewed-by: shade, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/28363 From liach at openjdk.org Mon Dec 1 18:27:34 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 1 Dec 2025 18:27:34 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v4] In-Reply-To: References: Message-ID: > Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. > > They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. > > We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. > > Paging @minborg who requested Optional folding for review. > > I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. Chen Liang has updated the pull request incrementally with one additional commit since the last revision: bracket styles ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28540/files - new: https://git.openjdk.org/jdk/pull/28540/files/7a1cfa4a..d353bdbe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28540&range=02-03 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28540.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28540/head:pull/28540 PR: https://git.openjdk.org/jdk/pull/28540 From rrich at openjdk.org Mon Dec 1 18:32:57 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 1 Dec 2025 18:32:57 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v4] In-Reply-To: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> References: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> Message-ID: On Thu, 20 Nov 2025 10:21:34 GMT, Richard Reingruber wrote: >> With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. >> >> It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. >> >> The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. >> >> The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. >> Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. >> >> So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) >> >> There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. >> >> Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. >> >> ##### Testing with fastdebug builds on AARCH64 and PPC64: >> >> hotspot_vector_1 >> hotspot_vector_2 >> jdk_vector >> jdk_vector_sanity >> >> ##### The change passed our CI testing: >> Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: >> >> compiler/vectorapi/VectorRearrangeTest.java >> jdk/incubator/vector/Byte128VectorLoadStoreTests.java >> jdk/incubator/vector/Double256VectorLoadStoreTests.java >> jdk/incubator/vector/Float128VectorTests.java >> jdk/incubator/vector/Long256VectorLoadStoreTests.java >> jdk/incubator/vector/Short128VectorLoadStoreTests.java >> jdk/incubator/vector/Vector64ConversionTests.java > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' > - Exclude IR check on riscv with rvv > - Enhance comment > - Fix OptoAssembly for Power 8 > - PPC: OptoAssembly for vector spilling > - Assert aligned sp offsets in vector spilling > - Delete TMP and !UseNewCode > - Align Matcher::_new_SP for better vector spilling > - TMP: trace unaligned vector spilling > - Add test Thanks for the review, G?tz! ------------- PR Comment: https://git.openjdk.org/jdk/pull/27969#issuecomment-3598210608 From rrich at openjdk.org Mon Dec 1 18:32:59 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 1 Dec 2025 18:32:59 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v4] In-Reply-To: <7h2_vOOkP-YCjBQ0dIRbNWg3o4gCjy4zwaAE62K0TkE=.c5a07e88-b55b-4be3-9e9d-7d484a663e98@github.com> References: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> <7h2_vOOkP-YCjBQ0dIRbNWg3o4gCjy4zwaAE62K0TkE=.c5a07e88-b55b-4be3-9e9d-7d484a663e98@github.com> Message-ID: On Thu, 20 Nov 2025 11:16:52 GMT, Goetz Lindenmaier wrote: > Is this true for oops, too? I think so (see [here](https://github.com/openjdk/jdk/blob/45c0600d3abfa4bcd0338840523c0df69283afe2/src/hotspot/share/opto/chaitin.cpp#L945-L950)). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/27969#discussion_r2578191613 From heidinga at openjdk.org Mon Dec 1 18:55:09 2025 From: heidinga at openjdk.org (Dan Heidinga) Date: Mon, 1 Dec 2025 18:55:09 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v4] In-Reply-To: References: Message-ID: <-9yglNAoD81NuGyLSS0ehpkPZmqK66Qyd7h4UFcztGA=.56a84f5e-a29a-4fc0-b0d7-ce20cac37851@github.com> On Mon, 1 Dec 2025 18:27:34 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > bracket styles A bit of meta-question about this PR and JEP 500: does this trust need to be rescinded if the user explicitly adds `--enable-final-field-mutation=` for the modules that contain these classes marked with the annotation? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3598347264 From vlivanov at openjdk.org Mon Dec 1 19:30:22 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 1 Dec 2025 19:30:22 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: References: Message-ID: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> > Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. > > There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. > > The difference can be illustrated with the following simple cases: > > class A { void m() {} } > class B extends A { void m() {} } > > void testInstanceOf(A obj) { > if (obj instanceof B) { > obj.m(); > } > } > > InstanceOf::testInstanceOf (12 bytes) > @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call > > vs > > void testInstanceOfCast(A obj) { > if (obj instanceof B) { > B b = (B)obj; > b.m(); > } > } > > InstanceOf::testInstanceOfCast (17 bytes) > @ 13 InstanceOf$B::m (1 bytes) inline (hot) > > > Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. > > FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. > > Testing: hs-tier1 - hs-tier5 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Test fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28517/files - new: https://git.openjdk.org/jdk/pull/28517/files/1cf6238f..0a5e78c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=00-01 Stats: 15 lines in 1 file changed: 5 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/28517.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28517/head:pull/28517 PR: https://git.openjdk.org/jdk/pull/28517 From vlivanov at openjdk.org Mon Dec 1 19:44:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 1 Dec 2025 19:44:48 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> References: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> Message-ID: On Mon, 1 Dec 2025 19:30:22 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Test fix Thanks, Roland! I slightly reworked the test to make it more robust. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28517#issuecomment-3598543397 From vlivanov at openjdk.org Mon Dec 1 19:51:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 1 Dec 2025 19:51:48 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 14:08:03 GMT, Quan Anh Mai wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Test fix > > src/hotspot/share/opto/parse2.cpp line 1739: > >> 1737: } >> 1738: >> 1739: // Match an instanceof check. > > We seem to require that the input of `SubTypeCheck` is not `null`. What do you think about allowing `SubTypeCheck` to accept `null` and return `false`? Yes, it's a good idea and the right direction to move. While experimenting with a different enhancement, I noticed that a subtype check leaves a null check behind irrespective of whether the check goes away or not. Unfortunately, there are some engineering considerations which complicates the change. `SubTypeCheck` is shared across all the places where subtype checks are performed, but `checkcast` and `instanceof` differ in the way `null` is handled. So, the proper way to fix it is to introduce a higher-level representation which implicitly handles nulls and then eventually lower it to `SubTypeCheck` and materialize null check if needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2578397399 From liach at openjdk.org Mon Dec 1 20:23:49 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 1 Dec 2025 20:23:49 GMT Subject: RFR: 8372696: Allow boot classes to explicitly opt-in for final field trusting [v4] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 18:27:34 GMT, Chen Liang wrote: >> Currently, the hotspot compiler (as in ciField) trusts final fields in hidden classes, record classes, and selected jdk packages. Some classes in the JDK wish to be trusted, but they cannot apply package-wide opt-in due to other legacy classes in the package, such as java.util. >> >> They currently can use `@Stable` as a workaround, but this is fragile because a stable final field may hold a trusted null, zero, or false value, which is currently treated as non-constant by ciField. >> >> We should add an annotation to opt-in for a whole class, mainly for legacy packages. This would benefit greatly some of our classes already using a lot of Stable, such as java.util.Optional, whose empty instance is now constant-foldable, as demonstrated in a new IR test. >> >> Paging @minborg who requested Optional folding for review. >> >> I think we can remove redundant Stable in a few other java.util classes after this patch is integrated. I plan to do that in subsequent patches. > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > bracket styles This PR currently does not interact with JEP 500. However, as specified in `Field.set`, the result of setting a final field may be ignored, as Alan [commented](https://github.com/openjdk/jdk/pull/28540#discussion_r2573494589). So I don't think we need to rescind the current trusting even if users enable mutations. In addition, @DanHeidinga I made the same fault as you when I first saw `--enable-final-field-mutation=` - this actually represents the callers, instead of the target, of `Field.set`. The target of mutation is specified via `--add-opens`, if the target field is not public. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28540#issuecomment-3598676839 From liach at openjdk.org Mon Dec 1 20:34:31 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 1 Dec 2025 20:34:31 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v2] In-Reply-To: References: Message-ID: > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Logical fallacy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/522cbe9d..886d3918 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From vpaprotski at openjdk.org Mon Dec 1 21:23:14 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 1 Dec 2025 21:23:14 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory Message-ID: Requires a Broadwell machine, but was able to reproduce with an emulator: ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run ------------- Commit messages: - increase compiler code cache size Changes: https://git.openjdk.org/jdk/pull/28588/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28588&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372703 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28588.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28588/head:pull/28588 PR: https://git.openjdk.org/jdk/pull/28588 From dlong at openjdk.org Mon Dec 1 22:26:10 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 1 Dec 2025 22:26:10 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v3] In-Reply-To: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: > The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. > > I played around with other approaches, like: > 1. setting the stack to empty > 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() > 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set > but in the end I decided to go with the minimal localized low-risk change. Dean Long has updated the pull request incrementally with one additional commit since the last revision: add bugid ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28486/files - new: https://git.openjdk.org/jdk/pull/28486/files/8f89b007..5d577099 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28486&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28486&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28486.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28486/head:pull/28486 PR: https://git.openjdk.org/jdk/pull/28486 From dlong at openjdk.org Mon Dec 1 22:26:11 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 1 Dec 2025 22:26:11 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v2] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: <6JooQy0BdhEorBSCfW_R-v_YmFRnQ4N1hmwRxb0ALdU=.2d079520-51c1-447a-ac47-500daef45a68@github.com> On Mon, 1 Dec 2025 07:11:11 GMT, Christian Hagedorn wrote: > Looks good to me, too. > > > always setting the reexecute flag in add_safepoint_edges() if must_throw is set > > but in the end I decided to go with the minimal localized low-risk change. > > Is this something we should follow up on? Yes, I have several enhancements in this area on my list. I'll file a separate RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28486#issuecomment-3599217605 From dlong at openjdk.org Mon Dec 1 22:26:12 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 1 Dec 2025 22:26:12 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v3] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: <5gpqqUssgER1MM5K4nbgGHl5e2Uu5TUjPwZAx9Nsdkc=.fc776158-d1b5-451e-aa88-db41611e4f21@github.com> On Fri, 28 Nov 2025 12:13:39 GMT, Manuel H?ssig wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> add bugid > > test/hotspot/jtreg/compiler/exceptions/TestAccessErrorInCatch.java line 26: > >> 24: /* >> 25: * @test >> 26: * @bug 8367002 > > Suggestion: > > * @bug 8367002 8370766 > > Perhaps we should add this bug to the test, since you modified it. Good idea. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28486#discussion_r2578925529 From dlong at openjdk.org Mon Dec 1 22:30:50 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 1 Dec 2025 22:30:50 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v3] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: On Mon, 1 Dec 2025 22:26:10 GMT, Dean Long wrote: >> The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. >> >> I played around with other approaches, like: >> 1. setting the stack to empty >> 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() >> 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set >> but in the end I decided to go with the minimal localized low-risk change. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > add bugid I added the bugid to the test, so I'll need a quick re-review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28486#issuecomment-3599232577 From dlong at openjdk.org Mon Dec 1 22:30:53 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 1 Dec 2025 22:30:53 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v2] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: On Fri, 28 Nov 2025 12:14:39 GMT, Manuel H?ssig wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> remove extra spaces > > Thank you for fixing this, @dean-long. It looks good to me. @mhaessig and @chhagedorn , thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28486#issuecomment-3599233961 From vlivanov at openjdk.org Mon Dec 1 22:49:45 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 1 Dec 2025 22:49:45 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: <82Ddhg3yXemMeyKmZUCWZIPUVOTkdCbXiOcl8LO_Su0=.47680bc7-526d-4c15-9b84-dd9c7d27728d@github.com> References: <82Ddhg3yXemMeyKmZUCWZIPUVOTkdCbXiOcl8LO_Su0=.47680bc7-526d-4c15-9b84-dd9c7d27728d@github.com> Message-ID: On Thu, 27 Nov 2025 14:56:51 GMT, ExE Boss wrote: >> There are corresponding test cases (`testInstanceOfCondPre` et al) where conditions are embedded. >> >> The idea of `testInstanceOfCondLate` and similar test cases is to check how inlining works when condition improves receiver type during incremental inlining phase. > > What?I?meant was?where the?`instanceof` is?in the?called?method, the `testInstanceOfCondPre` all?have the?`instanceof`?checks as?part of?the?`if`?statement. > > -------------------------------------------------------------------------------- > > Something?like: > > static void testInstanceOfCondDefaultInlinePre(A a, boolean cond) { > if (defaultInlineInstanceOfCondPre(a, cond)) { > a.m(); > } > } > static void testInstanceOfCondDefaultInlinePost(A a, boolean cond) { > if (defaultInlineInstanceOfCondPost(a, cond)) { > a.m(); > } > } > > static void testIsInstanceCondDefaultInlinePre(A a, boolean cond) { > if (defaultInlineIsInstanceCondPre(a, cond)) { > a.m(); > } > } > static void testIsInstanceCondDefaultInlinePost(A a, boolean cond) { > if (defaultInlineIsInstanceCondPost(a, cond)) { > a.m(); > } > } > > > -------------------------------------------------------------------------------- > > I?suggest adding?such a?test because?of real?world?code which?use?different internal?implementation classes but?expose their?public?API as?only a?single common?supertype, like?`java.lang.constant.ClassDesc` and?its?`isPrimitive()`/`isArray()`/`isClassOrInterface()` methods (which?currently don?t do?the?`instanceof`?check, but?they probably?should so?that they?can be?reliably?inlined). The test is intended as a white-box test. It focuses on bytecode shapes which result in different IR representations and exercise different optimizations. From compiler perspective, there's no difference between `if (defaultInlineInstanceOfCond(a)) { ... }` and `if (a instanceof B) {...}` when inlining happens during parsing. Both test cases produce the very same IR after parsing is over. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2578972141 From vlivanov at openjdk.org Mon Dec 1 23:29:03 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 1 Dec 2025 23:29:03 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v6] In-Reply-To: <2ifEaoGuZU4duyckWchgOnnqfH6AgAcrqsiqBZH1Nx4=.1df7af8d-41ac-43a1-90ab-964eb80f155b@github.com> References: <2ifEaoGuZU4duyckWchgOnnqfH6AgAcrqsiqBZH1Nx4=.1df7af8d-41ac-43a1-90ab-964eb80f155b@github.com> Message-ID: On Mon, 1 Dec 2025 13:04:08 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: > > - Simplify third case: no need to loop, just restart the search > - Actually have a second "fast" case: receiver is not found in the table, and the table is full > - Pushing/popping for rare CAS path is counter-productive src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4826: > 4824: // and never duplicate the receivers in the list. > 4825: // > 4826: // It is tempting to combine these cases into a single loop, and claim the first Can you elaborate, please, why it is the case? Is it a result of class unloading or something else? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2579069740 From liach at openjdk.org Mon Dec 1 23:41:04 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 1 Dec 2025 23:41:04 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Tweak VH usage in some classes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/886d3918..7bcdcbf3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=01-02 Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From liach at openjdk.org Mon Dec 1 23:53:46 2025 From: liach at openjdk.org (Chen Liang) Date: Mon, 1 Dec 2025 23:53:46 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:41:04 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Tweak VH usage in some classes Since I removed the return type dropping VarHandle bypass, TestGetAndAdd became affected because it can no longer access the x86 assembly. Updated the Java calling convention to fix it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28585#issuecomment-3599477724 From liach at openjdk.org Tue Dec 2 00:16:14 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 00:16:14 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant Message-ID: Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. ------------- Commit messages: - Move around - Constant fold identity hash Changes: https://git.openjdk.org/jdk/pull/28589/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372845 Stats: 88 lines in 4 files changed: 87 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28589/head:pull/28589 PR: https://git.openjdk.org/jdk/pull/28589 From dholmes at openjdk.org Tue Dec 2 00:16:48 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 2 Dec 2025 00:16:48 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 18:10:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > reviews We either need this fix or a backout of whatever caused the problem. The fork is this week and this causes a lot of failures in testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3599537313 From vlivanov at openjdk.org Tue Dec 2 00:25:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 00:25:49 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> On Mon, 1 Dec 2025 23:41:04 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Tweak VH usage in some classes src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2033: > 2031: > 2032: @ForceInline > 2033: MethodHandle adaptedMethodHandle(VarHandle vh) { Can you elaborate, please, how this method is intended to behave? test/hotspot/jtreg/compiler/c2/irTests/TestGetAndAdd.java line 78: > 76: @IR(counts = {IRNode.X86_LOCK_XADDB, "3"}, phase = CompilePhase.FINAL_CODE) > 77: public static void addB() { > 78: var _ = (byte) B.getAndAdd(b2); > Since I removed the return type dropping VarHandle bypass, TestGetAndAdd became affected because it can no longer access the x86 assembly. It has performance implications for user code, doesn't it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579149358 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579150006 From liach at openjdk.org Tue Dec 2 01:12:48 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 01:12:48 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> Message-ID: <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> On Tue, 2 Dec 2025 00:20:21 GMT, Vladimir Ivanov wrote: >> Chen Liang has updated the pull request incrementally with one additional commit since the last revision: >> >> Tweak VH usage in some classes > > src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2033: > >> 2031: >> 2032: @ForceInline >> 2033: MethodHandle adaptedMethodHandle(VarHandle vh) { > > Can you elaborate, please, how this method is intended to behave? When this is compiled, `constant` will become either `1` for constant VH and `2` for non-constant VH. So for constant VH, this becomes a stable read. For a non-constant VH, this becomes `getMethodHandle(mode).asType(...)`, equivalent to before. > test/hotspot/jtreg/compiler/c2/irTests/TestGetAndAdd.java line 78: > >> 76: @IR(counts = {IRNode.X86_LOCK_XADDB, "3"}, phase = CompilePhase.FINAL_CODE) >> 77: public static void addB() { >> 78: var _ = (byte) B.getAndAdd(b2); > >> Since I removed the return type dropping VarHandle bypass, TestGetAndAdd became affected because it can no longer access the x86 assembly. > > It has performance implications for user code, doesn't it? The performance is measured by the existing `org.openjdk.bench.java.lang.invoke.VarHandleExact` benchmark, which originally expects `generic_genericInvocation` to be much slower. Now it instead has a performance on par with the exact invocations. The constant folding ability is verified with the new `VarHandleMismatchedTypeFold` IR test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579218324 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579221253 From vlivanov at openjdk.org Tue Dec 2 01:45:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 01:45:49 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> Message-ID: On Tue, 2 Dec 2025 01:08:19 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2033: >> >>> 2031: >>> 2032: @ForceInline >>> 2033: MethodHandle adaptedMethodHandle(VarHandle vh) { >> >> Can you elaborate, please, how this method is intended to behave? > > When this is compiled, `constant` will become either `1` for constant VH and `2` for non-constant VH. So for constant VH, this becomes a stable read. For a non-constant VH, this becomes `getMethodHandle(mode).asType(...)`, equivalent to before. What's the purpose of `constant == MethodHandleImpl.CONSTANT_YES ` and `constant != MethodHandleImpl.CONSTANT_NO` checks then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579287707 From liach at openjdk.org Tue Dec 2 01:51:47 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 01:51:47 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> Message-ID: <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> On Tue, 2 Dec 2025 01:42:50 GMT, Vladimir Ivanov wrote: >> When this is compiled, `constant` will become either `1` for constant VH and `2` for non-constant VH. So for constant VH, this becomes a stable read. For a non-constant VH, this becomes `getMethodHandle(mode).asType(...)`, equivalent to before. > > What's the purpose of `constant == MethodHandleImpl.CONSTANT_YES ` and `constant != MethodHandleImpl.CONSTANT_NO` checks then? Indeed, I should move the adaptedMh read into `constant == MethodHandleImpl.CONSTANT_YES` block. `constant != MethodHandleImpl.CONSTANT_NO` prevents capturing any further if the VH is known non-constant; we keep this branch in constant case in case the adapted MH is not ready when we know the VH is constant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579302480 From vlivanov at openjdk.org Tue Dec 2 01:51:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 01:51:49 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> Message-ID: On Tue, 2 Dec 2025 01:09:59 GMT, Chen Liang wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestGetAndAdd.java line 78: >> >>> 76: @IR(counts = {IRNode.X86_LOCK_XADDB, "3"}, phase = CompilePhase.FINAL_CODE) >>> 77: public static void addB() { >>> 78: var _ = (byte) B.getAndAdd(b2); >> >>> Since I removed the return type dropping VarHandle bypass, TestGetAndAdd became affected because it can no longer access the x86 assembly. >> >> It has performance implications for user code, doesn't it? > > The performance is measured by the existing `org.openjdk.bench.java.lang.invoke.VarHandleExact` benchmark, which originally expects `generic_genericInvocation` to be much slower. Now it instead has a performance on par with the exact invocations. > > The constant folding ability is verified with the new `VarHandleMismatchedTypeFold` IR test. If I understand the IR test logic correctly, C2 was able to compile `(void) B.getAndAdd(b2)` call down to the desired instruction sequence. Is it still the case after the fix? What happens if you keep `TestGetAndAdd.java ` intact? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579300293 From liach at openjdk.org Tue Dec 2 01:54:46 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 01:54:46 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> Message-ID: On Tue, 2 Dec 2025 01:48:13 GMT, Vladimir Ivanov wrote: >> The performance is measured by the existing `org.openjdk.bench.java.lang.invoke.VarHandleExact` benchmark, which originally expects `generic_genericInvocation` to be much slower. Now it instead has a performance on par with the exact invocations. >> >> The constant folding ability is verified with the new `VarHandleMismatchedTypeFold` IR test. > > If I understand the IR test logic correctly, C2 was able to compile `(void) B.getAndAdd(b2)` call down to the desired instruction sequence. Is it still the case after the fix? What happens if you keep `TestGetAndAdd.java > ` intact? No. The old code worked because it implicitly depended on the backdoor path present in the now removed `GUARD_METHOD_TEMPLATE_V` in `VarHandleGuardMethodGenerator`. If this test is intact, now its IR compiles to doing something in adaptedMethodHandle and calling a MethodHandle. Not sure why it doesn't inline through that MethodHandle. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579310315 From vlivanov at openjdk.org Tue Dec 2 02:02:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 02:02:46 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> Message-ID: On Tue, 2 Dec 2025 01:49:04 GMT, Chen Liang wrote: >> What's the purpose of `constant == MethodHandleImpl.CONSTANT_YES ` and `constant != MethodHandleImpl.CONSTANT_NO` checks then? > > Indeed, I should move the adaptedMh read into `constant == MethodHandleImpl.CONSTANT_YES` block. > > `constant != MethodHandleImpl.CONSTANT_NO` prevents capturing any further if the VH is known non-constant; we keep this branch in constant case in case the adapted MH is not ready when we know the VH is constant. I still have a hard time reasoning about state transitions of the cache. 1) Why do you limit successful cache read (`cache != null`) to constant `vh` case (`constant == MethodHandleImpl.CONSTANT_YES`)? 2) Why do you avoid cache update in non-constant case (`constant != MethodHandleImpl.CONSTANT_NO`)? What happens if it runs compiled `adaptedMethodHandle` method? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579329673 From vlivanov at openjdk.org Tue Dec 2 02:08:44 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 02:08:44 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> Message-ID: <53uGo7JI87pm-cZmvxBiHniURB_bKryyfrWpewgZLP8=.bb97af75-8a6d-4aa8-8a90-e8c4cbc77ec8@github.com> On Tue, 2 Dec 2025 01:52:04 GMT, Chen Liang wrote: >> If I understand the IR test logic correctly, C2 was able to compile `(void) B.getAndAdd(b2)` call down to the desired instruction sequence. Is it still the case after the fix? What happens if you keep `TestGetAndAdd.java >> ` intact? > > No. The old code worked because it implicitly depended on the backdoor path present in the now removed `GUARD_METHOD_TEMPLATE_V` in `VarHandleGuardMethodGenerator`. If this test is intact, now its IR compiles to doing something in adaptedMethodHandle and calling a MethodHandle. Not sure why it doesn't inline through that MethodHandle. Ok, so you eliminated a fast-path check for void-return case and now JIT can't fully optimize it anymore. Do I get it right? Since this particular bytecode shape is exposed through public API, I don't see why user code can't step on it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579341027 From liach at openjdk.org Tue Dec 2 02:16:52 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 02:16:52 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> Message-ID: <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> On Tue, 2 Dec 2025 02:00:08 GMT, Vladimir Ivanov wrote: >> Indeed, I should move the adaptedMh read into `constant == MethodHandleImpl.CONSTANT_YES` block. >> >> `constant != MethodHandleImpl.CONSTANT_NO` prevents capturing any further if the VH is known non-constant; we keep this branch in constant case in case the adapted MH is not ready when we know the VH is constant. > > I still have a hard time reasoning about state transitions of the cache. > > 1) Why do you limit successful cache read (`cache != null`) to constant `vh` case (`constant == MethodHandleImpl.CONSTANT_YES`)? > > 2) Why do you avoid cache update in non-constant case (`constant != MethodHandleImpl.CONSTANT_NO`)? What happens if it runs compiled `adaptedMethodHandle` method? So an `AccessDescriptor` is created for each sigpoly VH site in the source code. Usually it is `VH.operation()`, but it is legal to use a non-constant VarHandle variable and call an operation on that. If `constant == MethodHandleImpl.CONSTANT_NO`, we are sure that we have the non-constant case, so we cannot trust that cached method handle, and there is no point further caching. We can only read that previous MH conversion cache if `constant == MethodHandleImpl.CONSTANT_YES` because this means our cache is always correct. >> No. The old code worked because it implicitly depended on the backdoor path present in the now removed `GUARD_METHOD_TEMPLATE_V` in `VarHandleGuardMethodGenerator`. If this test is intact, now its IR compiles to doing something in adaptedMethodHandle and calling a MethodHandle. Not sure why it doesn't inline through that MethodHandle. > > Ok, so you eliminated a fast-path check for void-return case and now JIT can't fully optimize it anymore. Do I get it right? Since this particular bytecode shape is exposed through public API, I don't see why user code can't step on it. JIT can fully optimize it in JMH benchmarks. I don't know why the IR in this test can't optimize it - I couldn't reproduce this CI failure locally on my linux-x64-debug profile, but this modified test passes on CI. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579352226 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579353722 From vlivanov at openjdk.org Tue Dec 2 02:26:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 02:26:46 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:01:08 GMT, Chen Liang wrote: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. @liach Thanks for taking care of the fix. Here's a more polished version: https://github.com/openjdk/jdk/commit/c6c4e9f23a1bdf801d0cc8e36f343543b8bfccda ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3599884931 From vlivanov at openjdk.org Tue Dec 2 02:32:47 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 02:32:47 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> Message-ID: On Tue, 2 Dec 2025 02:13:15 GMT, Chen Liang wrote: >> I still have a hard time reasoning about state transitions of the cache. >> >> 1) Why do you limit successful cache read (`cache != null`) to constant `vh` case (`constant == MethodHandleImpl.CONSTANT_YES`)? >> >> 2) Why do you avoid cache update in non-constant case (`constant != MethodHandleImpl.CONSTANT_NO`)? What happens if it runs compiled `adaptedMethodHandle` method? > > So an `AccessDescriptor` is created for each sigpoly VH site in the source code. Usually it is `VH.operation()`, but it is legal to use a non-constant VarHandle variable and call an operation on that. If `constant == MethodHandleImpl.CONSTANT_NO`, we are sure that we have the non-constant case, so we cannot trust that cached method handle, and there is no point further caching. We can only read that previous MH conversion cache if `constant == MethodHandleImpl.CONSTANT_YES` because this means our cache is always correct. So, it seems like what you are trying to achieve is a 1-1 mapping from `AccessDescriptor` to `vh` through `adaptedMh`. So, once `cache != null` you can trust that it corresponds to the `vh` instance passed as a constant. But cache pollution can easily break the invariant, so you try to eliminate the pollution by avoiding cache updates when vh is not constant. Do I get it right? >> Ok, so you eliminated a fast-path check for void-return case and now JIT can't fully optimize it anymore. Do I get it right? Since this particular bytecode shape is exposed through public API, I don't see why user code can't step on it. > > JIT can fully optimize it in JMH benchmarks. I don't know why the IR in this test can't optimize it - I couldn't reproduce this CI failure locally on my linux-x64-debug profile, but this modified test passes on CI. I'd say it's a bad sign. Intermittent bugs manifest exactly in such a way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579374286 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579375565 From liach at openjdk.org Tue Dec 2 02:52:50 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 02:52:50 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:01:08 GMT, Chen Liang wrote: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. I have one question: would it be safer for us to move the constant detection after generate_virtual_guard in the `is_virtual` if block? I think it may be possible for users to create a `Object::hashCode` site with a constant receiver that is of a specialized class that overrides `hashCode`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3599934103 From liach at openjdk.org Tue Dec 2 02:54:46 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 02:54:46 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> Message-ID: On Tue, 2 Dec 2025 02:29:28 GMT, Vladimir Ivanov wrote: >> So an `AccessDescriptor` is created for each sigpoly VH site in the source code. Usually it is `VH.operation()`, but it is legal to use a non-constant VarHandle variable and call an operation on that. If `constant == MethodHandleImpl.CONSTANT_NO`, we are sure that we have the non-constant case, so we cannot trust that cached method handle, and there is no point further caching. We can only read that previous MH conversion cache if `constant == MethodHandleImpl.CONSTANT_YES` because this means our cache is always correct. > > So, it seems like what you are trying to achieve is a 1-1 mapping from `AccessDescriptor` to `vh` through `adaptedMh`. So, once `cache != null` you can trust that it corresponds to the `vh` instance passed as a constant. But cache pollution can easily break the invariant, so you try to eliminate the pollution by avoiding cache updates when vh is not constant. Do I get it right? No. The avoidance of cache update simply trims down the generated code by throwing away the meaningless cache update. The access to cache is already safeguarded by `constant == MethodHandleImpl.CONSTANT_YES`. I should have moved `var cache = adaptedMh;` into the if block of `constant == CONSTANT_YES`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2579405388 From wenanjian at openjdk.org Tue Dec 2 06:36:53 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Tue, 2 Dec 2025 06:36:53 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: <5HbBb-mjtZWqWTu-HQe7KrRyHG5z-UK4rbVhMzLv4bw=.b1b7e986-dbcf-4ab0-86b4-513f3f1f91ae@github.com> Message-ID: <7J3oLdDF73T7tFgpg2yZvAZGVxcmxskCXw7ugnA5gMs=.e1f55b91-0825-4895-8009-c880c668d4c6@github.com> On Wed, 19 Nov 2025 09:55:47 GMT, Hamlin Li wrote: >>> Some more comments and questions. >> >> Thanks for the careful reviews! I will check the comments and reply one by one later > >> > Some more comments and questions. >> >> Thanks for the careful reviews! I will check the comments and reply one by one later > > > Thanks! Overall looks good, I'll have another by this weekend. Thanks for your patience! @Hamlin-Li Thanks, I have modified some code according to your suggestions and replied to all the comments. Could you please help review it again when you have time? : ) > > > Some more comments and questions. > > > > > > Thanks for the careful reviews! I will check the comments and reply one by one later > > Thanks! Overall looks good, I'll have another by this weekend. Thanks for your patience! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25281#issuecomment-3600415249 From thartmann at openjdk.org Tue Dec 2 06:44:49 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 2 Dec 2025 06:44:49 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 02:49:52 GMT, Chen Liang wrote: > I think it may be possible for users to create a Object::hashCode site with a constant receiver that is of a specialized class that overrides hashCode. Yes, I think so too. We need a test for this scenario. Just an observation: This patch will only allow folding during parsing. I would expect that often, opportunities only arise after other optimizations already took place. For example, something like this would not be optimized if we run with `-XX:+AlwaysIncrementalInline`, right? static final Object a = new Object(); @ForceInline public Object getter(Object obj) { return obj; } public long test() { return getter(a).hashCode(); } Another example: Object val = new Object(); int limit = 2; for (; limit < 4; limit *= 2); for (int i = 2; i < limit; i++) { val = a; } return val.hashCode(); // After loop opts, C2 knows that val == a So ideally, we would move this optimization to IGVN. This would also help Valhalla, where we need to (re-)compute the hashcode for a scalarized value object and would therefore like to fold the computation as aggressively as possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3600438904 From amitkumar at openjdk.org Tue Dec 2 07:10:47 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 2 Dec 2025 07:10:47 GMT Subject: RFR: 8372641: [s390x] Test failure TestMergeStores.java [v3] In-Reply-To: <6iaWuz5X4ol8NmIvbWoQBxmceux35b3529t1sONwCZA=.08c49f3a-87dc-4030-a5a7-1a83f4209fe0@github.com> References: <6iaWuz5X4ol8NmIvbWoQBxmceux35b3529t1sONwCZA=.08c49f3a-87dc-4030-a5a7-1a83f4209fe0@github.com> Message-ID: On Thu, 27 Nov 2025 08:59:09 GMT, Harshit470250 wrote: >> [JDK-8347405](https://bugs.openjdk.org/browse/JDK-8347405) introduced a mergeStores optimisation which requires ReverseBytesS opcode and as it was not implemented for s390 the test case is failing. >> I also implemented ReverseBytesUS. > > Harshit470250 has updated the pull request incrementally with one additional commit since the last revision: > > Added whitespace LGTM, I ran tier1 test and it fixed the testcase without new regression. @RealLucy you want to take a look at this one ? ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/28523#pullrequestreview-3528539709 From chagedorn at openjdk.org Tue Dec 2 07:39:57 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 07:39:57 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 18:10:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > reviews Otherwise, looks good to me. test/hotspot/jtreg/compiler/arraycopy/TestArrayCopyDisjoint.java line 29: > 27: /** > 28: * @test > 29: * @bug 8251871 8285301 You can add the bug number here: Suggestion: * @bug 8251871 8285301 8371964 ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28410#pullrequestreview-3528660084 PR Review Comment: https://git.openjdk.org/jdk/pull/28410#discussion_r2580001963 From epeter at openjdk.org Tue Dec 2 07:43:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 07:43:48 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 18:10:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > reviews Looks good to me, thanks for fixing this @merykitty ! src/hotspot/share/opto/vectornode.cpp line 1177: > 1175: int load_sz = type2aelembytes(mask_bt) * ty->get_con(); > 1176: if (load_sz > MaxVectorSize) { > 1177: // See LoadVectorMaskedNode::Ideal Suggestion: // After loop opts, cast nodes are aggressively removed, if the input is then transformed // into a constant that is outside the range of the removed cast, we may encounter it here. // This should be a dead node then. Optional: Might as well just repeat the explanation. If the code in `LoadVectorMaskedNode::Ideal` changes it is unlikely that we would notice here, and then we'd have a dead link. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28410#pullrequestreview-3528671396 PR Review Comment: https://git.openjdk.org/jdk/pull/28410#discussion_r2580010355 From chagedorn at openjdk.org Tue Dec 2 07:47:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 07:47:50 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v3] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: On Mon, 1 Dec 2025 22:26:10 GMT, Dean Long wrote: >> The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. >> >> I played around with other approaches, like: >> 1. setting the stack to empty >> 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() >> 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set >> but in the end I decided to go with the minimal localized low-risk change. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > add bugid Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28486#pullrequestreview-3528688158 From epeter at openjdk.org Tue Dec 2 07:53:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 07:53:57 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: <0df3H15uO96P1n3zLpKl5y_RKrAgc1h_V91bGB5mCr8=.06942d05-f66d-442f-a754-8135ac0eec30@github.com> Message-ID: On Tue, 25 Nov 2025 17:46:28 GMT, Quan Anh Mai wrote: >> Is this issue at all related to https://github.com/openjdk/jdk/pull/24575? >> >> It seems we remove a `CastLL` from the graph, because the input type is wider than the Cast's type, right? >> >> If I remember correctly from https://github.com/openjdk/jdk/pull/24575, if a CastLL is narrowing, we don't want to remove it, see `ConstraintCastNode::Identity`. >> >> Can you elaborate a bit more on where the `CastLL` came from, and what it is supposed to do? > > @eme64 Yes, it is indeed similar. The issue here is that after loop opts, we try to remove almost all `CastNode`s so that the graph can be GVN-ed better (think of `x = a + b` and `y = cast(a) + b`). > >> Can you elaborate a bit more on where the `CastLL` came from, and what it is supposed to do? > > Macro expansion tries to be smart for an array copy and does this: > > byte[] dst; > byte[] src; > int len; > if (len <= 32) { > int casted_len = cast(len, 0, 32); > vectormask mask = VectorMaskGen(casted_len); > vector v = LoadVectorMasked(src, 0, mask); > StoreVectorMasked(dst, 0, v, mask); > } else { > // do the copy normally; > } > > As you can see, the masked accesses are only meaningful if `len <= 32`. But after loop opts, the cast is gone, leaving us with a len which happens to be larger than `32`. The path should be dead, but IGVN reaches the `LoadVectorMaskedNode` first, which triggers the assert. @merykitty Hold off with integration for a few hours, @chhagedorn just launched some internal testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3600692116 From hgreule at openjdk.org Tue Dec 2 07:54:01 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 2 Dec 2025 07:54:01 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> Message-ID: On Tue, 2 Dec 2025 02:30:28 GMT, Vladimir Ivanov wrote: >> JIT can fully optimize it in JMH benchmarks. I don't know why the IR in this test can't optimize it - I couldn't reproduce this CI failure locally on my linux-x64-debug profile, but this modified test passes on CI. > > I'd say it's a bad sign. Intermittent bugs manifest exactly in such a way. > The performance is measured by the existing `org.openjdk.bench.java.lang.invoke.VarHandleExact` benchmark, which originally expects `generic_genericInvocation` to be much slower. Now it instead has a performance on par with the exact invocations. > > The constant folding ability is verified with the new `VarHandleMismatchedTypeFold` IR test. The benchmark doesn't consider such inexact getAndAdd calls (with a void return type), I think it should cover that too. This is a very common pattern that really must not regress. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2580041626 From chagedorn at openjdk.org Tue Dec 2 08:01:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 08:01:50 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v2] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 15:42:25 GMT, Emanuel Peter wrote: >> **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. >> >> **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. >> >> **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > limit steps of optimize, for Manuel Looks good to me, too! src/hotspot/share/opto/vtransform.cpp line 45: > 43: void VTransformOptimize::worklist_push(VTransformNode* vtn) { > 44: if (_worklist_set.test_set(vtn->_idx)) { return; } > 45: _worklist.push(vtn); I would flip this since it's only one line: Suggestion: if (!_worklist_set.test_set(vtn->_idx)) { _worklist.push(vtn); } test/hotspot/jtreg/compiler/loopopts/superword/TestLongReductionChain.java line 38: > 36: * -XX:CompileCommand=compileonly,${test.main.class}::test > 37: * ${test.main.class} > 38: * @run driver ${test.main.class} Suggestion: * @run main ${test.main.class} ------------- PR Review: https://git.openjdk.org/jdk/pull/28512#pullrequestreview-3528694733 PR Review Comment: https://git.openjdk.org/jdk/pull/28512#discussion_r2580027890 PR Review Comment: https://git.openjdk.org/jdk/pull/28512#discussion_r2580040060 From qamai at openjdk.org Tue Dec 2 08:09:30 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 08:09:30 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v4] In-Reply-To: References: Message-ID: <-SdGoKVu9WpxzbLyqrLt7duH-qK_Bbm6ErrWdDfxJUg=.95c14f7f-b940-4dc6-a63d-055419625a36@github.com> > Hi, > > This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. > > To be more specific, for this issue, we have the graph that looks like: > > ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen > > with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: > > ConI -> ConvI2L -> VectorMaskGen > > After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. > > Please take a look and leave your thoughts, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: bug number in test, comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28410/files - new: https://git.openjdk.org/jdk/pull/28410/files/ec7298ef..c462f0ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28410&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28410&range=02-03 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28410/head:pull/28410 PR: https://git.openjdk.org/jdk/pull/28410 From qamai at openjdk.org Tue Dec 2 08:09:31 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 08:09:31 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 18:10:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > reviews Thanks a lot for your reviews, please reapprove when the tests pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3600742616 From epeter at openjdk.org Tue Dec 2 08:13:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 08:13:47 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v3] In-Reply-To: References: Message-ID: <0ZncRodHJbWfRFLrUCqQn5JPHilDPQA8e7dcrwARsOI=.7fe44906-eeed-49bb-8472-5a264391468b@github.com> > **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. > > **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. > > **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes - Apply suggestions from code review Co-authored-by: Christian Hagedorn - limit steps of optimize, for Manuel - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes - rm old documentation - git move to new test - streamline - refactor and verify - unique worklist - wip solution - ... and 1 more: https://git.openjdk.org/jdk/compare/a4a5fdb0...54881ff4 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28512/files - new: https://git.openjdk.org/jdk/pull/28512/files/9f5bf837..54881ff4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28512&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28512&range=01-02 Stats: 12411 lines in 291 files changed: 6379 ins; 5081 del; 951 mod Patch: https://git.openjdk.org/jdk/pull/28512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28512/head:pull/28512 PR: https://git.openjdk.org/jdk/pull/28512 From epeter at openjdk.org Tue Dec 2 08:13:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 08:13:49 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 07:59:25 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> limit steps of optimize, for Manuel > > Looks good to me, too! @chhagedorn Thanks for having a look. I applied the changes! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28512#issuecomment-3600744825 From mhaessig at openjdk.org Tue Dec 2 08:46:54 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 2 Dec 2025 08:46:54 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v3] In-Reply-To: <0ZncRodHJbWfRFLrUCqQn5JPHilDPQA8e7dcrwARsOI=.7fe44906-eeed-49bb-8472-5a264391468b@github.com> References: <0ZncRodHJbWfRFLrUCqQn5JPHilDPQA8e7dcrwARsOI=.7fe44906-eeed-49bb-8472-5a264391468b@github.com> Message-ID: On Tue, 2 Dec 2025 08:13:47 GMT, Emanuel Peter wrote: >> **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. >> >> **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. >> >> **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - limit steps of optimize, for Manuel > - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes > - rm old documentation > - git move to new test > - streamline > - refactor and verify > - unique worklist > - wip solution > - ... and 1 more: https://git.openjdk.org/jdk/compare/69927660...54881ff4 Marked as reviewed by mhaessig (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28512#pullrequestreview-3528909043 From shade at openjdk.org Tue Dec 2 08:47:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 08:47:02 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v6] In-Reply-To: References: <2ifEaoGuZU4duyckWchgOnnqfH6AgAcrqsiqBZH1Nx4=.1df7af8d-41ac-43a1-90ab-964eb80f155b@github.com> Message-ID: On Mon, 1 Dec 2025 23:25:42 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: >> >> - Simplify third case: no need to loop, just restart the search >> - Actually have a second "fast" case: receiver is not found in the table, and the table is full >> - Pushing/popping for rare CAS path is counter-productive > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4826: > >> 4824: // and never duplicate the receivers in the list. >> 4825: // >> 4826: // It is tempting to combine these cases into a single loop, and claim the first > > Can you elaborate, please, why it is the case? Is it a result of class unloading or something else? Yes, we are clearing MDOs for unloaded classes. I initially thought this kind of cleanup happens only during `ciReceiverTypeData::translate_receiver_data_from` translation to `ciReceiverTypeData`. If that was the only path, we would probably not care about this; although I would, for defensive programming reasons. *But* it looks like the cleanup happens during "normal" GC class unloading, which also makes sense: you do not want to have unloaded classes referenced from any runtime datastructure, including MDO. The path I saw was: ReceiverTypeData::clear_row ReceiverTypeData::clean_weak_klass_links MethodData::clean_method_data InstanceKlass::clean_method_data InstanceKlass::clean_weak_instanceklass_links Klass::clean_weak_instanceklass_links KlassCleaningTask::work ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2580205471 From mhaessig at openjdk.org Tue Dec 2 08:48:54 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 2 Dec 2025 08:48:54 GMT Subject: RFR: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack [v3] In-Reply-To: References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: On Mon, 1 Dec 2025 22:26:10 GMT, Dean Long wrote: >> The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. >> >> I played around with other approaches, like: >> 1. setting the stack to empty >> 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() >> 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set >> but in the end I decided to go with the minimal localized low-risk change. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > add bugid Thank you for addressing my comments and the credit. Looks good. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28486#pullrequestreview-3528915361 From mhaessig at openjdk.org Tue Dec 2 08:52:45 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 2 Dec 2025 08:52:45 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 15:40:00 GMT, Roland Westrelin wrote: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Fwiw, testing passed up to tier3 on linux-x64, linux-aarch64, macosx-aarch64, mac-x64, windows-x64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28581#issuecomment-3600902694 From mhaessig at openjdk.org Tue Dec 2 08:58:48 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 2 Dec 2025 08:58:48 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 21:16:18 GMT, Volodymyr Paprotski wrote: > Requires a Broadwell machine, but was able to reproduce with an emulator: > > > ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi > ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run Thank you for fixing this, @vpaprotsk. Please also remove the problem listing of the `compiler/arguments/TestCodeEntryAlignment.java`: https://github.com/openjdk/jdk/blob/84ffe87260753973835ea6b88443e28bcaf0122f/test/hotspot/jtreg/ProblemList.txt#L82 Meanwhile, I will run testing on our side and report back with the results. ------------- Changes requested by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28588#pullrequestreview-3528955992 From pminborg at openjdk.org Tue Dec 2 09:04:53 2025 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 2 Dec 2025 09:04:53 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:41:04 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Tweak VH usage in some classes src/hotspot/share/opto/library_call.cpp line 8926: > 8924: bool LibraryCallKit::inline_isCompileConstant() { > 8925: Node* n = argument(0); > 8926: set_result(n->is_Con() ? intcon(1) : intcon(2)); Can we get constants for these magic numbers on the C side as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2580266341 From roland at openjdk.org Tue Dec 2 09:09:16 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:09:16 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: References: Message-ID: <2fn6Bj9HrMiWa4K01CuW-vnL7XKWwkLcTapeVkHqWUo=.9d0e24a5-dc6c-4abe-ae45-10b93479071e@github.com> > Crash occurs because a `MergeMem` node references itself: > > > 608 MergeMem === _ 1 608 1 1 1 1 1 1 1 1 1 1 878 [[ 877 878 608 420 597 ]] { - - - - - - - - - - N878:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !orig=[524] !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > ``` > > Before IGVN, that part of the stream is: > > > 522 Region === 522 604 521 [[ 522 538 523 524 525 526 527 528 529 530 531 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > 524 Phi === 522 608 464 [[ 588 581 564 546 564 559 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > > 538 If === 522 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 553 547 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 540 IfFalse === 538 [[ 548 546 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 553 If === 539 535 [[ 554 555 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 554 IfTrue === 553 [[ 562 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 555 IfFalse === 553 [[ 548 559 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > > 548 Region === 548 _ 540 555 [[ 548 562 561 563 564 565 566 567 568 569 570 571 572 573 574 575 576 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:88 (line 60) > 564 Phi === 548 _ 524 524 [[ 581 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:85 (line 61) > > 562 Region === 562 548 554 [[ 562 600 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 596 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > 581 Phi === 562 564 524 [[ 420 597 610 608 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > > 608 MergeMem === _ 1 581 1 1 1 1 1 1 1 1 1 1 588 [[ 524 ]] { - - - - - - - - - - N588:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > > > 522 is a loop head, 604 is the backedge. The loop becomes unreachable > during IGVN. The loop body above is transformed to: > > > 538 If === 604 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 562 547 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (l... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - review - Merge branch 'master' into JDK-8371464 - test - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28554/files - new: https://git.openjdk.org/jdk/pull/28554/files/052b7a46..825c9dd5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28554&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28554&range=00-01 Stats: 19717 lines in 576 files changed: 12800 ins; 3715 del; 3202 mod Patch: https://git.openjdk.org/jdk/pull/28554.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28554/head:pull/28554 PR: https://git.openjdk.org/jdk/pull/28554 From roland at openjdk.org Tue Dec 2 09:09:18 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:09:18 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: <5g0wstbmWC5gv_OG3sTv5Lb0eYCR4Cq3zQb1PJiWA6w=.efbee9c2-a9db-428b-8aa7-1c3d198d05e9@github.com> References: <5g0wstbmWC5gv_OG3sTv5Lb0eYCR4Cq3zQb1PJiWA6w=.efbee9c2-a9db-428b-8aa7-1c3d198d05e9@github.com> Message-ID: On Mon, 1 Dec 2025 07:57:06 GMT, Aleksey Shipilev wrote: > GHA failures in [com/sun/crypto/provider/Cipher/HPKE/KAT9180](https://github.com/rwestrel/jdk/actions/runs/19761317022#user-content-com_sun_crypto_provider_cipher_hpke_kat9180) would disappear if you merge from master. Actually, this might mean the PR base is quite old, and there might be other bugs on the intersection with this one. Merge from master and pass the GHA, maybe? I merged with latest ------------- PR Comment: https://git.openjdk.org/jdk/pull/28554#issuecomment-3600960580 From roland at openjdk.org Tue Dec 2 09:09:19 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:09:19 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: References: Message-ID: <3t5HBU3tkgGioH6r3THy2oBLYGZ1JzOOWBKM-8lEeuc=.749bada1-f8c3-4872-8f64-21abfc4b5707@github.com> On Mon, 1 Dec 2025 08:02:47 GMT, Damon Fenacci wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8371464 >> - test >> - fix > > src/hotspot/share/opto/cfgnode.cpp line 1404: > >> 1402: Node* other_phi_input = in(j); >> 1403: if (other_phi_input != nullptr && other_phi_input == merge_mem->base_memory() && !is_data_loop(region, phi_input, igvn)) { >> 1404: // merge_mem is a successor memory to other_phi_input, and is not pinned inside the diamond, so push it out. > > Do you think it might be worth adding an additional reason for `!is_data_loop` in the comment? I added a comment in the new commit. Can you have a look? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28554#discussion_r2580274083 From roland at openjdk.org Tue Dec 2 09:13:38 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:13:38 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v7] In-Reply-To: References: Message-ID: > In test cases, `mh` is initially not constant so the method handle > invoke can't be inlined. It is later found to be constant, so it can > be turned into a direct call by > `Compile::process_late_inline_calls_no_inline()`. In the meantime, the > `CallNode` for the mh invoke is cloned (by loop switching). In the > process, only a shallow copy of the `JVMState` for the call is > made. The initial `CallNode` is the first to be processed by > `Compile::process_late_inline_calls_no_inline()` and that causes that > `CallNode` to become dead. The cloned `CallNode` is then > processed. The `JVMState` for that one references the initial > `CallNode` in its caller's `JVMState`. Because that node is dead, that > causes a crash. The fix I propose is to make a deep copy of the > `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is > assigned to the node. > > The other failure I see with these tests is: > > > # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 > # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! > > > because even though the `CallNode` is cloned, there's still only one > late inline recorded. The fix here is to increment > `_number_of_mh_late_inlines` when the node is cloned. > > This was reported by the netty developers. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'master' into JDK-8370939 - Merge branch 'master' into JDK-8370939 - review - Merge branch 'master' into JDK-8370939 - review - more - more - more - more - test - ... and 1 more: https://git.openjdk.org/jdk/compare/1b191400...64b11e6e ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28088/files - new: https://git.openjdk.org/jdk/pull/28088/files/bf46ba3e..64b11e6e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=05-06 Stats: 19716 lines in 575 files changed: 12799 ins; 3715 del; 3202 mod Patch: https://git.openjdk.org/jdk/pull/28088.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28088/head:pull/28088 PR: https://git.openjdk.org/jdk/pull/28088 From roland at openjdk.org Tue Dec 2 09:13:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:13:41 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v6] In-Reply-To: <7nY7QRkkFjOtOuBXID1I4GluA0vnFRLy_UnRICfVkR4=.99ec7fe1-af27-4ab7-ac63-27aa12bec4ef@github.com> References: <7nY7QRkkFjOtOuBXID1I4GluA0vnFRLy_UnRICfVkR4=.99ec7fe1-af27-4ab7-ac63-27aa12bec4ef@github.com> Message-ID: On Fri, 21 Nov 2025 11:33:42 GMT, Roland Westrelin wrote: >> In test cases, `mh` is initially not constant so the method handle >> invoke can't be inlined. It is later found to be constant, so it can >> be turned into a direct call by >> `Compile::process_late_inline_calls_no_inline()`. In the meantime, the >> `CallNode` for the mh invoke is cloned (by loop switching). In the >> process, only a shallow copy of the `JVMState` for the call is >> made. The initial `CallNode` is the first to be processed by >> `Compile::process_late_inline_calls_no_inline()` and that causes that >> `CallNode` to become dead. The cloned `CallNode` is then >> processed. The `JVMState` for that one references the initial >> `CallNode` in its caller's `JVMState`. Because that node is dead, that >> causes a crash. The fix I propose is to make a deep copy of the >> `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is >> assigned to the node. >> >> The other failure I see with these tests is: >> >> >> # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 >> # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! >> >> >> because even though the `CallNode` is cloned, there's still only one >> late inline recorded. The fix here is to increment >> `_number_of_mh_late_inlines` when the node is cloned. >> >> This was reported by the netty developers. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370939 > - review > - Merge branch 'master' into JDK-8370939 > - review > - more > - more > - more > - more > - test > - fix Anyone for another review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28088#issuecomment-3600983968 From roland at openjdk.org Tue Dec 2 09:20:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:20:41 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge branch 'master' into JDK-8354282 - whitespace - review - review - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java Co-authored-by: Christian Hagedorn - review - review - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 ------------- Changes: https://git.openjdk.org/jdk/pull/24575/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=07 Stats: 365 lines in 13 files changed: 264 ins; 27 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From pminborg at openjdk.org Tue Dec 2 09:31:03 2025 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 2 Dec 2025 09:31:03 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:41:04 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Tweak VH usage in some classes src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2036: > 2034: var constant = MethodHandleImpl.isCompileConstant(vh); > 2035: var cache = adaptedMh; > 2036: if (constant == MethodHandleImpl.CONSTANT_YES && cache != null) { Rookie question: Is there multi-thread considerations here? How about visibility across threads? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2580353068 From shade at openjdk.org Tue Dec 2 09:43:17 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 09:43:17 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v7] In-Reply-To: References: Message-ID: > See the bug for discussion what issues current machinery has. > > This PR executes the plan outlined in the bug: > 1. Common the receiver type profiling code in interpreter and C1 > 2. Rewrite receiver type profiling code to only do atomic receiver slot installations > 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed > > This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - More comments - Tighten up the comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25305/files - new: https://git.openjdk.org/jdk/pull/25305/files/f3e0fa4d..39cc4dfe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=05-06 Stats: 13 lines in 1 file changed: 2 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/25305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305 PR: https://git.openjdk.org/jdk/pull/25305 From shade at openjdk.org Tue Dec 2 09:43:19 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 09:43:19 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v6] In-Reply-To: References: <2ifEaoGuZU4duyckWchgOnnqfH6AgAcrqsiqBZH1Nx4=.1df7af8d-41ac-43a1-90ab-964eb80f155b@github.com> Message-ID: On Tue, 2 Dec 2025 08:44:21 GMT, Aleksey Shipilev wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4826: >> >>> 4824: // and never duplicate the receivers in the list. >>> 4825: // >>> 4826: // It is tempting to combine these cases into a single loop, and claim the first >> >> Can you elaborate, please, why it is the case? Is it a result of class unloading or something else? > > Yes, we are clearing MDOs for unloaded classes. > > I initially thought this kind of cleanup happens only during `ciReceiverTypeData::translate_receiver_data_from` translation to `ciReceiverTypeData`. If that was the only path, we would probably not care about this; although I would, for defensive programming reasons. *But* it looks like the cleanup happens during "normal" GC class unloading, which also makes sense: you do not want to have unloaded classes referenced from any runtime datastructure, including MDO. So this forces our hand to deal with empty slots. Old code also did this, AFAICS: it scanned everything at least once. > > The path to receiver cleanup I saw in the code was: > > > ReceiverTypeData::clear_row > ReceiverTypeData::clean_weak_klass_links > MethodData::clean_method_data > InstanceKlass::clean_method_data > InstanceKlass::clean_weak_instanceklass_links > Klass::clean_weak_instanceklass_links > KlassCleaningTask::work > I tightened up the comments a bit to mention that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25305#discussion_r2580384110 From roland at openjdk.org Tue Dec 2 09:49:29 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:49:29 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v4] In-Reply-To: References: Message-ID: > The test case has an out of loop `Store` with an `AddP` address > expression that has other uses and is in the loop body. Schematically, > only showing the address subgraph and the bases for the `AddP`s: > > > Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 > -> CastPP#110 > > > Both `AddP`s have the same base, a `CastPP` that's also in the loop > body. > > That loop is a counted loop and only has 3 iterations so is fully > unrolled. First, one iteration is peeled: > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > The `AddP`s and `CastPP` are cloned (because in the loop body). As > part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is > called. It finds the test that guards `CastPP#283` in the peeled > iteration dominates and replaces the test that guards `CastPP#110` > (the test in the peeled iteration is the clone of the test in the > loop). That causes `CastPP#110`'s control to be updated to that of the > test in the peeled iteration and to be yanked from the loop. So now > `CastPP#283` and `CastPP#110` have the same inputs. > > Next unrolling happens: > > > /-> CastPP#110 > /-> AddP#400 -> AddP#401 -> CastPP#110 > Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 > \ -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > `AddP`s are cloned once more but not the `CastPP`s because they are > both in the peeled iteration now. A new `Phi` is added. > > Next igvn runs. It's going to push the `AddP`s through the `Phi`s. > > Through `Phi#477`: > > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 > \ -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > Through `Phi#360`: > > > /-> AddP#134 -> CastPP#110 > /-> Phi#509 -> AddP#401 -> CastPP#110 > Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 > -> Phi#514 -> CastPP#283 > -> CastP#110 > > > Then `Phi#514` which has 2 `CastPP`s as input with identical inputs is > transformed into anot... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - more - review - Merge branch 'master' into JDK-8351889 - exp - Merge branch 'master' into JDK-8351889 - verif - Merge branch 'master' into JDK-8351889 - test seed - more - Merge branch 'master' into JDK-8351889 - ... and 4 more: https://git.openjdk.org/jdk/compare/0419511c...15c17bb1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25386/files - new: https://git.openjdk.org/jdk/pull/25386/files/d52f2ded..15c17bb1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=02-03 Stats: 19726 lines in 577 files changed: 12800 ins; 3715 del; 3211 mod Patch: https://git.openjdk.org/jdk/pull/25386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25386/head:pull/25386 PR: https://git.openjdk.org/jdk/pull/25386 From roland at openjdk.org Tue Dec 2 09:49:37 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:49:37 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v3] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 11:53:16 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8351889 >> - verif >> - Merge branch 'master' into JDK-8351889 >> - test seed >> - more >> - Merge branch 'master' into JDK-8351889 >> - Merge branch 'master' into JDK-8351889 >> - more >> - test >> - fix > > src/hotspot/share/opto/phaseX.cpp line 2085: > >> 2083: } >> 2084: return false; >> 2085: } > > Why not call it `verify_node_invariants_for`? > > You should also assert immediately. @benoitmaillard Is about to make that change for everything: https://github.com/openjdk/jdk/pull/28295 That one is not integrated. Shouldn't I do that change only if it/when integrates? > src/hotspot/share/opto/phaseX.hpp line 623: > >> 621: // '-XX:VerifyIterativeGVN=10000' >> 622: return ((VerifyIterativeGVN % 100000) / 10000) == 1; >> 623: } > > You will need to add extra documentation to the flag. And also there is a test that uses the flag. You should adjust it to enable this bit as well. Done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2580416861 PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2580411292 From roland at openjdk.org Tue Dec 2 09:58:18 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 09:58:18 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v2] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 09:46:31 GMT, Emanuel Peter wrote: > > The duplication comes from loop body cloning so I'm not sure how we could prevent the duplication. We could try to common the CastPP nodes once PhaseIdealLoop::peeled_dom_test_elim() is called. > > Right, that could be an option. Do you think that is worth it? `IfNode::Ideal` looks for a dominating `If` that can replace the current `If`. It's not clear to me that that transformation can't trigger a similar failure which is why I think a fix during igvn is more robust. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3601177439 From roland at openjdk.org Tue Dec 2 10:06:57 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 10:06:57 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v2] In-Reply-To: References: Message-ID: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/compile.hpp Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28581/files - new: https://git.openjdk.org/jdk/pull/28581/files/27524015..eb7bd9ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28581/head:pull/28581 PR: https://git.openjdk.org/jdk/pull/28581 From roland at openjdk.org Tue Dec 2 10:06:59 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 10:06:59 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v2] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 16:07:47 GMT, Manuel H?ssig wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/compile.hpp >> >> Co-authored-by: Manuel H?ssig > > test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java line 27: > >> 25: * @test >> 26: * @bug 8370519 >> 27: * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations > > Unsure, but would this test qualify for `@key stress`? I'm not sure either what does. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2580480945 From vklang at openjdk.org Tue Dec 2 10:08:39 2025 From: vklang at openjdk.org (Viktor Klang) Date: Tue, 2 Dec 2025 10:08:39 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: <8tnePBnWu5w86zsXUOVMd7R_oHsTVl_Gjug0QP7N_vw=.5ce0106a-a7bf-40a2-b6a4-76e5d816150e@github.com> On Mon, 1 Dec 2025 23:41:04 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Tweak VH usage in some classes src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 632: > 630: @Hidden > 631: @jdk.internal.vm.annotation.IntrinsicCandidate > 632: static int isCompileConstant(Object obj) { nit: an "is"-question tends to indicate a yes/no answer, but in this case it is more of a compileConstantStatus. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2580489368 From dfenacci at openjdk.org Tue Dec 2 10:18:50 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 2 Dec 2025 10:18:50 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: <3t5HBU3tkgGioH6r3THy2oBLYGZ1JzOOWBKM-8lEeuc=.749bada1-f8c3-4872-8f64-21abfc4b5707@github.com> References: <3t5HBU3tkgGioH6r3THy2oBLYGZ1JzOOWBKM-8lEeuc=.749bada1-f8c3-4872-8f64-21abfc4b5707@github.com> Message-ID: On Tue, 2 Dec 2025 09:04:52 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/cfgnode.cpp line 1404: >> >>> 1402: Node* other_phi_input = in(j); >>> 1403: if (other_phi_input != nullptr && other_phi_input == merge_mem->base_memory() && !is_data_loop(region, phi_input, igvn)) { >>> 1404: // merge_mem is a successor memory to other_phi_input, and is not pinned inside the diamond, so push it out. >> >> Do you think it might be worth adding an additional reason for `!is_data_loop` in the comment? > > I added a comment in the new commit. Can you have a look? ? Thank you Roland. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28554#discussion_r2580530978 From krk at openjdk.org Tue Dec 2 10:22:48 2025 From: krk at openjdk.org (Kerem Kat) Date: Tue, 2 Dec 2025 10:22:48 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v4] In-Reply-To: References: Message-ID: On Thu, 27 Nov 2025 19:14:05 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: > > fix rename `gc/TestAllocHumongousFragment_generational` failed, seems unrelated: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x000055fe3a2ce64e, pid=7345, tid=7349 # # JRE version: OpenJDK Runtime Environment (26.0) (build 26-internal-krk-a0f0ecb951a83c5069995130cfd803ad9165295f) # Java VM: OpenJDK 64-Bit Server VM (26-internal-krk-a0f0ecb951a83c5069995130cfd803ad9165295f, mixed mode, static, sharing, tiered, compressed oops, compressed class ptrs, shenandoah gc, linux-amd64) # Problematic frame: # V [java+0x149b64e] void ShenandoahMark::mark_loop_work, (ShenandoahGenerationType)1, false, (StringDedupMode)0>(ShenandoahMarkRefsClosure<(ShenandoahGenerationType)1>*, unsigned short*, unsigned int, TaskTerminator*, StringDedup::Requests*) [clone .isra.0]+0x25e ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3601288584 From mhaessig at openjdk.org Tue Dec 2 10:29:29 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 2 Dec 2025 10:29:29 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:03:52 GMT, Roland Westrelin wrote: >> test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java line 27: >> >>> 25: * @test >>> 26: * @bug 8370519 >>> 27: * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations >> >> Unsure, but would this test qualify for `@key stress`? > > I'm not sure either what does. It is a marker to filter resource intensive tests. https://github.com/openjdk/jdk/blob/7278d2e8e5835f090672f7625d391a1b4c1a6626/test/hotspot/jtreg/TEST.ROOT#L29-L30 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2580570634 From shade at openjdk.org Tue Dec 2 10:31:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 10:31:22 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: > See the bug for discussion what issues current machinery has. > > This PR executes the plan outlined in the bug: > 1. Common the receiver type profiling code in interpreter and C1 > 2. Rewrite receiver type profiling code to only do atomic receiver slot installations > 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed > > This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - More comments - Tighten up the comments - Simplify third case: no need to loop, just restart the search - Actually have a second "fast" case: receiver is not found in the table, and the table is full - Pushing/popping for rare CAS path is counter-productive - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls - Tighten up some more - Offset is always rscratch1, no need to save it - Grossly simplify register shuffling - ... and 11 more: https://git.openjdk.org/jdk/compare/7278d2e8...3c5019d9 ------------- Changes: https://git.openjdk.org/jdk/pull/25305/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25305&range=07 Stats: 418 lines in 8 files changed: 202 ins; 197 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/25305.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25305/head:pull/25305 PR: https://git.openjdk.org/jdk/pull/25305 From qamai at openjdk.org Tue Dec 2 10:36:42 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 10:36:42 GMT Subject: RFR: 8350208: CTW: GraphKit::add_safepoint_edges asserts "not enough operands for reexecution" Message-ID: Hi, This PR fixes the issue of the compiler crashing with "not enough operands for reexecution". The issue here is that during `Parse::catch_inline_exceptions`, the old stack is gone, and we cannot reexecute the current bytecode anymore. However, there are some places where we try to insert safepoints into the graph, such as if the handler is a backward jump, or if one of the exceptions in the handlers is not loaded. Since the `_reexecute` state of the current jvms is "undefined", it is inferred automatically that it should reexecute for some bytecodes such as `putfield`. The solution then is to explicitly set `_reexecute` to false. I can manage to write a unit test for the case of a backward handler, for the other cases, since the exceptions that can be thrown for a bytecode that is inferred to reexecute are `NullPointerException`, `ArrayIndexOutOfBoundsException`, and `ArrayStoreException`. I find it hard to construct such a test in which one of them is not loaded. Please kindly review, thanks a lot. ------------- Commit messages: - Set jvms()->_reexecute to false during Parse::catch_inline_exceptions Changes: https://git.openjdk.org/jdk/pull/28597/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28597&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350208 Stats: 153 lines in 3 files changed: 152 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28597.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28597/head:pull/28597 PR: https://git.openjdk.org/jdk/pull/28597 From shade at openjdk.org Tue Dec 2 10:51:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 10:51:01 GMT Subject: RFR: 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 Message-ID: I just noticed (while looking at [JDK-8372800](https://bugs.openjdk.org/browse/JDK-8372800)) that I made a little error in [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) refactor, which made GetAndSet-acquire instruction cost twice as high. The usual cost for acquire versions are twice as low, likely to be selected instead of non-acquire versions. This bug happened as I "simplified" stencils at some point by dropping some arguments and renumbering the remaining ones. This is one place where I apparently forgot to renumber one usage. See other checks for `ifelse($3,Acq,...` in that stencil, all of them are `$3` (correct), not `$4` (incorrect). Seen no real bugs because of this mishap, but it would be good to fix it in case we see issues later. I also looked at stencils again, and I think there are no other argument-index problems like this anywhere else. The real change is in `aarch64_atomic_ad.m4`, `.ad` is re-generated from that stencil. Additional testing: - [ ] Linux AArch64 server fastdebug, `all` - [ ] Linux AArch64 server fastdebug, quick jcstress run ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/28598/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28598&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372862 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/28598.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28598/head:pull/28598 PR: https://git.openjdk.org/jdk/pull/28598 From chagedorn at openjdk.org Tue Dec 2 11:02:49 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 11:02:49 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v3] In-Reply-To: <0ZncRodHJbWfRFLrUCqQn5JPHilDPQA8e7dcrwARsOI=.7fe44906-eeed-49bb-8472-5a264391468b@github.com> References: <0ZncRodHJbWfRFLrUCqQn5JPHilDPQA8e7dcrwARsOI=.7fe44906-eeed-49bb-8472-5a264391468b@github.com> Message-ID: On Tue, 2 Dec 2025 08:13:47 GMT, Emanuel Peter wrote: >> **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. >> >> **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. >> >> **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - limit steps of optimize, for Manuel > - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes > - rm old documentation > - git move to new test > - streamline > - refactor and verify > - unique worklist > - wip solution > - ... and 1 more: https://git.openjdk.org/jdk/compare/01e88711...54881ff4 Looks good, thanks for the update! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28512#pullrequestreview-3529498827 From chagedorn at openjdk.org Tue Dec 2 11:05:35 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 11:05:35 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v4] In-Reply-To: <-SdGoKVu9WpxzbLyqrLt7duH-qK_Bbm6ErrWdDfxJUg=.95c14f7f-b940-4dc6-a63d-055419625a36@github.com> References: <-SdGoKVu9WpxzbLyqrLt7duH-qK_Bbm6ErrWdDfxJUg=.95c14f7f-b940-4dc6-a63d-055419625a36@github.com> Message-ID: On Tue, 2 Dec 2025 08:09:30 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > bug number in test, comment Testing passed! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28410#pullrequestreview-3529507626 From jbhateja at openjdk.org Tue Dec 2 11:17:43 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 2 Dec 2025 11:17:43 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v3] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> <5bV8t0Bo16-WVON8_AJLfcPDDqWVHDxIjmdGPPNazE8=.51d5a17d-1b87-44d4-ad41-e9d346e6b9f7@github.com> Message-ID: <3fvWzoSiBb5iYddxX90qvM7Vzhf9Nb218fc_dHWgS-E=.31e63450-14e2-4674-b966-93cb8bcbfb20@github.com> On Thu, 27 Nov 2025 16:12:59 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Fine tune matcher check > > Ok, that's fine with me too. > > It would be nice if you could also attach a regression test, or maybe add an additional run to the existing test, with the required flags for reproducing this issue. Hi @eme64 , kindly verify latest changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28533#issuecomment-3601503186 From jbhateja at openjdk.org Tue Dec 2 11:18:18 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 2 Dec 2025 11:18:18 GMT Subject: RFR: 8351844: C2 x64 AVX2 vpminmax assertion failure with equivalent inputs Message-ID: Bug fix PR fixes an incorrect register equivalence in macro assembler. MaxV/MinV IR with equivalent inputs should ideally be removed from ideal graph before reaching to macro assembler. [JDK-8372797](https://bugs.openjdk.org/browse/JDK-8372797) is filed to add relevant identity transformations. Best Regards, Jatin ------------- Commit messages: - 8351844: C2 x64 AVX2 vpminmax assertion failure with equivalent inputs Changes: https://git.openjdk.org/jdk/pull/28600/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28600&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351844 Stats: 70 lines in 2 files changed: 68 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28600.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28600/head:pull/28600 PR: https://git.openjdk.org/jdk/pull/28600 From roland at openjdk.org Tue Dec 2 11:21:05 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 11:21:05 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: References: Message-ID: > For this failure memory stats are: > > > Total Usage: 1095525816 > --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 --- > Phase Total ra node comp type states reglive regsplit regmask superword cienv ha other > none 5976032 331560 5402064 197512 33712 10200 0 0 984 0 0 0 0 > parse 2716464 65456 1145480 196408 1112752 0 0 0 0 0 196368 0 0 > optimizer 98184 0 32728 0 65456 0 0 0 0 0 0 0 0 > connectionGraph 32728 0 0 32728 0 0 0 0 0 0 0 0 0 > iterGVN 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > idealLoop 918189632 0 38687056 872824784 392776 0 0 0 0 0 6285016 0 0 > idealLoopVerify 2228144 0 0 2228144 0 0 0 0 0 0 0 0 0 > macroExpand 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > graphReshape 32728 0 32728 0 0 0 0 0 0 0 0 0 0 > matcher 20135944 3369848 9033208 7536400 65456 131032 0 0 0 0 0 0 0 > postselect_cleanup 294872 294872 0 0 0 0 0 0 0 0 0 0 0 > scheduler 752944 196488 556456 0 0 0 0 0 0 0 0 0 0 > regalloc 388736 388736 0 0 0 0 0 0 0 0 0 0 0 > ctorChaitin 160032 ... Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - review - review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28581/files - new: https://git.openjdk.org/jdk/pull/28581/files/eb7bd9ac..36fb3a6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28581&range=01-02 Stats: 49 lines in 5 files changed: 22 ins; 24 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28581/head:pull/28581 PR: https://git.openjdk.org/jdk/pull/28581 From roland at openjdk.org Tue Dec 2 11:21:05 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 11:21:05 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 16:17:03 GMT, Quan Anh Mai wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - review >> - review > > src/hotspot/share/opto/compile.hpp line 810: > >> 808: // Compilation environment. >> 809: Arena* comp_arena() { return &_comp_arena; } >> 810: ResourceArea* idealloop_arena() { return &_idealloop_arena; } > > Should we make it more idiomatic C++ by having the `ResourceArea` allocated and deallocated together with the `PhaseIdealLoop` instead of attaching it to the `Compile` object? Right, that makes sense. Done in new commits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2580735137 From roland at openjdk.org Tue Dec 2 11:21:06 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 11:21:06 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:26:22 GMT, Manuel H?ssig wrote: >> I'm not sure either what does. > > It is a marker to filter resource intensive tests. > > https://github.com/openjdk/jdk/blob/7278d2e8e5835f090672f7625d391a1b4c1a6626/test/hotspot/jtreg/TEST.ROOT#L29-L30 I added it in the new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2580735997 From roland at openjdk.org Tue Dec 2 11:21:08 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 11:21:08 GMT Subject: RFR: 8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations [v3] In-Reply-To: References: Message-ID: <_HueHgU8Ha0yoG9cckWMGfms8D0WC6zGWKykIQkCeZM=.3f929996-99ee-4535-8973-b23ccf6b291e@github.com> On Mon, 1 Dec 2025 16:33:20 GMT, Beno?t Maillard wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - review >> - review > > test/hotspot/jtreg/compiler/c2/TestVerifyLoopOptimizationsHighMemUsage.java line 28: > >> 26: * @bug 8370519 >> 27: * @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations >> 28: * @run main/othervm -XX:CompileCommand=compileonly,*TestVerifyLoopOptimizationsHighMemUsage*::* -XX:-TieredCompilation -Xbatch > > Out of curiosity, have you try reducing the test with `creduce`? I fixed a similar issue in [JDK-8366990](https://bugs.openjdk.org/browse/JDK-8366990), and initially reviewers were concerned about the long compilation time. I was able to get decent results with `creduce` by using `-XX:CompileCommand=memlimit`. Not sure if it's worth doing here though. I don't have `creduce` set up. I tried minimizing the test case by hand but it was fairly time consuming. It currently runs in 30s on a fairly fast machine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28581#discussion_r2580733874 From thartmann at openjdk.org Tue Dec 2 12:51:05 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 2 Dec 2025 12:51:05 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v7] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:13:38 GMT, Roland Westrelin wrote: >> In test cases, `mh` is initially not constant so the method handle >> invoke can't be inlined. It is later found to be constant, so it can >> be turned into a direct call by >> `Compile::process_late_inline_calls_no_inline()`. In the meantime, the >> `CallNode` for the mh invoke is cloned (by loop switching). In the >> process, only a shallow copy of the `JVMState` for the call is >> made. The initial `CallNode` is the first to be processed by >> `Compile::process_late_inline_calls_no_inline()` and that causes that >> `CallNode` to become dead. The cloned `CallNode` is then >> processed. The `JVMState` for that one references the initial >> `CallNode` in its caller's `JVMState`. Because that node is dead, that >> causes a crash. The fix I propose is to make a deep copy of the >> `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is >> assigned to the node. >> >> The other failure I see with these tests is: >> >> >> # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 >> # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! >> >> >> because even though the `CallNode` is cloned, there's still only one >> late inline recorded. The fix here is to increment >> `_number_of_mh_late_inlines` when the node is cloned. >> >> This was reported by the netty developers. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370939 > - Merge branch 'master' into JDK-8370939 > - review > - Merge branch 'master' into JDK-8370939 > - review > - more > - more > - more > - more > - test > - ... and 1 more: https://git.openjdk.org/jdk/compare/8558ffcd...64b11e6e Looks good to me. I submitted some testing and will report back once it passed. src/hotspot/share/opto/compile.hpp line 1102: > 1100: > 1101: void mark_has_mh_late_inlines() { _has_mh_late_inlines = true; } > 1102: bool has_mh_late_inlines() const { return _has_mh_late_inlines; } Suggestion: bool has_mh_late_inlines() const { return _has_mh_late_inlines; } ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28088#pullrequestreview-3529910175 PR Review Comment: https://git.openjdk.org/jdk/pull/28088#discussion_r2581026773 From epeter at openjdk.org Tue Dec 2 13:13:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 13:13:40 GMT Subject: RFR: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism [v3] In-Reply-To: References: <0ZncRodHJbWfRFLrUCqQn5JPHilDPQA8e7dcrwARsOI=.7fe44906-eeed-49bb-8472-5a264391468b@github.com> Message-ID: On Tue, 2 Dec 2025 08:44:25 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - limit steps of optimize, for Manuel >> - Merge branch 'master' into JDK-8372451-too-many-dead-vector-reduction-vtnodes >> - rm old documentation >> - git move to new test >> - streamline >> - refactor and verify >> - unique worklist >> - wip solution >> - ... and 1 more: https://git.openjdk.org/jdk/compare/195c2f9b...54881ff4 > > Marked as reviewed by mhaessig (Committer). @mhaessig @chhagedorn Thanks for the reviews and suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28512#issuecomment-3601959763 From epeter at openjdk.org Tue Dec 2 13:13:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 13:13:43 GMT Subject: Integrated: 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 16:02:20 GMT, Emanuel Peter wrote: > **Context**: `VTransform::optimize`. Works a bit like IGVN, it allows each node to perform optimizations. Recently introduced during JDK26. > > **Problem**: I made the assumption that we don't need a worklist mechanism, we can just do multiple passes over all nodes. The assumption was that there would not be any "trickling" of updates over the graph. But that is wrong: for example we can have a long chain of dead nodes, and we need to progressively remove the last node and mark it as dead. > > **Solution**: Implement proper worklist mechanism, so that updates can trickle over the graph. This pull request has now been integrated. Changeset: 6c01d3b0 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/6c01d3b08862447983b96daaf34a4c62daf54101 Stats: 208 lines in 3 files changed: 164 ins; 1 del; 43 mod 8372451: C2 SuperWord: "endless loop" assert. Need to implement proper worklist mechanism Reviewed-by: mhaessig, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28512 From krk at openjdk.org Tue Dec 2 13:32:22 2025 From: krk at openjdk.org (Kerem Kat) Date: Tue, 2 Dec 2025 13:32:22 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v5] In-Reply-To: References: Message-ID: > Do not try to replace `fallthrough_memproj` when it is null, fixes crash. > > Test case is simplified from the ticket. Verified that the case crashes without the fix. Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'master' into fix-c2-segfault-unlocknode - address comments - fix rename - rename test file - Merge branch 'master' into fix-c2-segfault-unlocknode - fix test spacing - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - Update src/hotspot/share/opto/macro.cpp Co-authored-by: Manuel H?ssig - copyright format fix? - 8370502: C2: segfault while adding node to IGVN worklist ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28432/files - new: https://git.openjdk.org/jdk/pull/28432/files/a0f0ecb9..21018290 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28432&range=03-04 Stats: 13228 lines in 281 files changed: 6943 ins; 5064 del; 1221 mod Patch: https://git.openjdk.org/jdk/pull/28432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28432/head:pull/28432 PR: https://git.openjdk.org/jdk/pull/28432 From krk at openjdk.org Tue Dec 2 13:32:26 2025 From: krk at openjdk.org (Kerem Kat) Date: Tue, 2 Dec 2025 13:32:26 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v4] In-Reply-To: <1Agol3OtcCV7ilUBseuyB3DMWXfinb4bTBnRafLtfS0=.d4081ee2-4495-471e-85e2-ffcc2f825d21@github.com> References: <1Agol3OtcCV7ilUBseuyB3DMWXfinb4bTBnRafLtfS0=.d4081ee2-4495-471e-85e2-ffcc2f825d21@github.com> Message-ID: On Fri, 28 Nov 2025 10:23:06 GMT, Emanuel Peter wrote: >> Kerem Kat has updated the pull request incrementally with one additional commit since the last revision: >> >> fix rename > > test/hotspot/jtreg/compiler/c2/TestUnlockNodeNullMemprof.java line 29: > >> 27: * @summary Do not segfault while adding node to IGVN worklist >> 28: * >> 29: * @run main/othervm -Xbatch compiler.c2.TestUnlockNodeNullMemprof > > Suggestion: > > * @run main/othervm -Xbatch ${test.main.class} > > > Possible since a recent JTREG update. Makes wrongly copying class name go away ;) > > Also: I wonder if we should also have a run without any flags? Removing `-Xbatch` makes the test non-deterministic in this case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28432#discussion_r2581178352 From epeter at openjdk.org Tue Dec 2 13:35:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 13:35:29 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v8] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Mon, 1 Dec 2025 13:39:09 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Review comments resolutions Testing submitted! Code looks good to me :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28533#issuecomment-3602086228 From chagedorn at openjdk.org Tue Dec 2 13:49:05 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 13:49:05 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: <2fn6Bj9HrMiWa4K01CuW-vnL7XKWwkLcTapeVkHqWUo=.9d0e24a5-dc6c-4abe-ae45-10b93479071e@github.com> References: <2fn6Bj9HrMiWa4K01CuW-vnL7XKWwkLcTapeVkHqWUo=.9d0e24a5-dc6c-4abe-ae45-10b93479071e@github.com> Message-ID: On Tue, 2 Dec 2025 09:09:16 GMT, Roland Westrelin wrote: >> Crash occurs because a `MergeMem` node references itself: >> >> >> 608 MergeMem === _ 1 608 1 1 1 1 1 1 1 1 1 1 878 [[ 877 878 608 420 597 ]] { - - - - - - - - - - N878:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !orig=[524] !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) >> ``` >> >> Before IGVN, that part of the stream is: >> >> >> 522 Region === 522 604 521 [[ 522 538 523 524 525 526 527 528 529 530 531 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) >> 524 Phi === 522 608 464 [[ 588 581 564 546 564 559 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) >> >> 538 If === 522 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) >> 539 IfTrue === 538 [[ 553 547 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) >> 540 IfFalse === 538 [[ 548 546 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) >> 553 If === 539 535 [[ 554 555 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) >> 554 IfTrue === 553 [[ 562 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) >> 555 IfFalse === 553 [[ 548 559 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) >> >> 548 Region === 548 _ 540 555 [[ 548 562 561 563 564 565 566 567 568 569 570 571 572 573 574 575 576 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:88 (line 60) >> 564 Phi === 548 _ 524 524 [[ 581 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:85 (line 61) >> >> 562 Region === 562 548 554 [[ 562 600 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 596 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) >> 581 Phi === 562 564 524 [[ 420 597 610 608 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) >> >> 608 MergeMem === _ 1 581 1 1 1 1 1 1 1 1 1 1 588 [[ 524 ]] { - - - - - - - - - - N588:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) >> >> >> 522 is a loop head, 604 is the backedge. The loop becomes unreachable >> during IGVN. The loop body above is transformed to: >> >> >> 538 If === 604 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) >> 539 IfTrue === 538 ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8371464 > - test > - fix Testing passed! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28554#pullrequestreview-3530234212 From chagedorn at openjdk.org Tue Dec 2 13:54:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 13:54:26 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <2xxjKX6hMeKDfS9SGBEvll8yadDthCoUjCIRpaE8ObA=.b567ec00-7dad-4b57-82a4-db1149fc8942@github.com> On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 Thanks for the update, it looks good to me! If @eme64 also agrees with the latest patch, we can submit some testing and then hopefully get it in right before the fork. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3530251375 From chagedorn at openjdk.org Tue Dec 2 13:54:29 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 13:54:29 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: References: <6qShqR-Ohv7vamoJ_B4Ev-poU8SB96eTBo4HFJrylcI=.dac5a26f-c9f0-445b-8f1c-a7c719fa27ae@github.com> <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> Message-ID: On Thu, 27 Nov 2025 12:29:10 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/castnode.hpp line 101: >> >>> 99: } >>> 100: return NonFloatingNonNarrowing; >>> 101: } >> >> Just a side note: We seem to mix the terms "(non-)pinned" with "(non-)floating" freely. Should we stick to just one? But maybe it's justified to use both depending on the situation/code context. > > The patch as it is now adds some extra uses of "pinned" and "floating". What could make sense, I suppose, would be to try to use "floating"/"non floating" instead but there are so many uses of "pinned" in the code base already, and I don't see us getting rid of them, that I wonder if it would make a difference. So, I'm not too sure what to do. Yes, that's true. I was also unsure about whether we should stick with one or just allow both interchangeably. I guess since there are so many uses, we can just move forward with what you have now and still come back to clean it up if necessary - we can always do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581285955 From chagedorn at openjdk.org Tue Dec 2 13:54:34 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Dec 2025 13:54:34 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v4] In-Reply-To: <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> References: <6qShqR-Ohv7vamoJ_B4Ev-poU8SB96eTBo4HFJrylcI=.dac5a26f-c9f0-445b-8f1c-a7c719fa27ae@github.com> <4QQp7C7iIVfVs1MoUMC56KCgVGpXu5ziTHfZ-f2pk6o=.4ca7e1a8-3f31-44d3-aaec-30429ed7e2b0@github.com> Message-ID: On Wed, 26 Nov 2025 13:24:05 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - review >> - review >> - Merge branch 'master' into JDK-8354282 >> - review >> - infinite loop in gvn fix >> - renaming >> - merge >> - Merge branch 'master' into JDK-8354282 >> - fix & test > > src/hotspot/share/opto/castnode.hpp line 120: > >> 118: // be removed in any case otherwise the sunk node floats back into the loop. >> 119: static const DependencyType NonFloatingNonNarrowing; >> 120: > > I needed a moment to completely understand all these combinations. I rewrote the definitions in this process a little bit. Feel free to take some of it over: > > > // All the possible combinations of floating/narrowing with example use cases: > > // Use case example: Range Check CastII > // Floating: The Cast is only dependent on the single range check. > // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely > // remove the cast because the array access will be safe. > static const DependencyType FloatingNarrowing; > > // Use case example: Widening Cast nodes' types after loop opts: We want to common Casts with slightly different types. > // Floating: These Casts only depend on the single control. > // NonNarrowing: Even when the input type is narrower, we are not removing the Cast. Otherwise, the dependency > // to the single control is lost, and an array access could float above its range check because we > // just removed the dependency to the range check by removing the Cast. This could lead to an > // out-of-bounds access. > static const DependencyType FloatingNonNarrowing; > > // Use case example: An array accesses that is no longer dependent on a single range check (e.g. range check smearing). > // NonFloating: The array access must be pinned below all the checks it depends on. If the check it directly depends > // on with a control input is hoisted, we do hoist the Cast as well. If we allowed the Cast to float, > // we risk that the array access ends up above another check it depends on (we cannot model two control > // dependencies for a node in the IR). This could lead to an out-of-bounds access. > // Narrowing: If the Cast does not narrow the input type, then it's safe to remove the cast because the array access > // will be safe. > static const DependencyType NonFloatingNarrowing; > > // Use case example: Sinking nodes out of a loop > // Non-Floating & Non-Narrowing: We don't want the Cast that forces the node to be out of loop to be removed in any > // case. Otherwise, the sunk node could float back into the l... Thanks for taking it over :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581287358 From roland at openjdk.org Tue Dec 2 14:03:07 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 14:03:07 GMT Subject: RFR: 8371464: C2: assert(no_dead_loop) failed: dead loop detected [v2] In-Reply-To: References: <2fn6Bj9HrMiWa4K01CuW-vnL7XKWwkLcTapeVkHqWUo=.9d0e24a5-dc6c-4abe-ae45-10b93479071e@github.com> Message-ID: On Tue, 2 Dec 2025 13:46:48 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8371464 >> - test >> - fix > > Testing passed! @chhagedorn @dafedafe thanks for the reviews and testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28554#issuecomment-3602214864 From roland at openjdk.org Tue Dec 2 14:03:09 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 2 Dec 2025 14:03:09 GMT Subject: Integrated: 8371464: C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: References: Message-ID: On Fri, 28 Nov 2025 10:50:44 GMT, Roland Westrelin wrote: > Crash occurs because a `MergeMem` node references itself: > > > 608 MergeMem === _ 1 608 1 1 1 1 1 1 1 1 1 1 878 [[ 877 878 608 420 597 ]] { - - - - - - - - - - N878:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !orig=[524] !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > ``` > > Before IGVN, that part of the stream is: > > > 522 Region === 522 604 521 [[ 522 538 523 524 525 526 527 528 529 530 531 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > 524 Phi === 522 608 464 [[ 588 581 564 546 564 559 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:75 (line 59) > > 538 If === 522 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 553 547 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 540 IfFalse === 538 [[ 548 546 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 553 If === 539 535 [[ 554 555 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 554 IfTrue === 553 [[ 562 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > 555 IfFalse === 553 [[ 548 559 ]] #0 !jvms: TestDeadLoopAtMergeMem::test @ bci:82 (line 59) > > 548 Region === 548 _ 540 555 [[ 548 562 561 563 564 565 566 567 568 569 570 571 572 573 574 575 576 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:88 (line 60) > 564 Phi === 548 _ 524 524 [[ 581 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:85 (line 61) > > 562 Region === 562 548 554 [[ 562 600 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 596 ]] #reducible !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > 581 Phi === 562 564 524 [[ 420 597 610 608 ]] #memory Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:90 (line 62) > > 608 MergeMem === _ 1 581 1 1 1 1 1 1 1 1 1 1 588 [[ 524 ]] { - - - - - - - - - - N588:java/lang/Throwable (java/io/Serializable)+20 * [narrow] } Memory: @BotPTR *+bot, idx=Bot; !jvms: TestDeadLoopAtMergeMem::test @ bci:94 (line 62) > > > 522 is a loop head, 604 is the backedge. The loop becomes unreachable > during IGVN. The loop body above is transformed to: > > > 538 If === 604 535 [[ 539 540 ]] P=0.999000, C=-1.000000 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (line 59) > 539 IfTrue === 538 [[ 562 547 560 ]] #1 !jvms: TestDeadLoopAtMergeMem::test @ bci:79 (l... This pull request has now been integrated. Changeset: a62296d8 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/a62296d8a0858d63a930e91168254a9927f06783 Stats: 91 lines in 3 files changed: 84 ins; 0 del; 7 mod 8371464: C2: assert(no_dead_loop) failed: dead loop detected Reviewed-by: chagedorn, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/28554 From dfenacci at openjdk.org Tue Dec 2 14:44:10 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 2 Dec 2025 14:44:10 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory In-Reply-To: References: Message-ID: <3SFe0aKR8DW5SKjr375S78OWgJS7g2pLZfepb43yISI=.958eda85-ca1a-4f85-a9a2-c7ad60dcc025@github.com> On Mon, 1 Dec 2025 21:16:18 GMT, Volodymyr Paprotski wrote: > Requires a Broadwell machine, but was able to reproduce with an emulator: > > > ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi > ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run Thanks @vpaprotsk for fixing this. Looks good to me (if tests are OK). src/hotspot/cpu/x86/stubDeclarations_x86.hpp line 76: > 74: do_arch_entry, \ > 75: do_arch_entry_init) \ > 76: do_arch_blob(compiler, 120000 WINDOWS_ONLY(+2000)) \ I was wondering if there are any reason for this value (apart that it is enough for the test to pass. I just noticed that it has been increased already in the past). ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/28588#pullrequestreview-3530481034 PR Review Comment: https://git.openjdk.org/jdk/pull/28588#discussion_r2581474386 From rrich at openjdk.org Tue Dec 2 14:45:09 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 2 Dec 2025 14:45:09 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v7] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 10:27:36 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Minor simplification. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Fix missing whitespace. > - Address review comments. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Remove K from AES_Crypt > - More minor cleanup. > - Improve comment and minor cleanup. > - 8371820: Further AES performance improvements for key schedule generation The changes look good to me. Thanks, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28299#pullrequestreview-3530496843 From mdoerr at openjdk.org Tue Dec 2 14:50:09 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 Dec 2025 14:50:09 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v7] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 10:27:36 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Minor simplification. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Fix missing whitespace. > - Address review comments. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Remove K from AES_Crypt > - More minor cleanup. > - Improve comment and minor cleanup. > - 8371820: Further AES performance improvements for key schedule generation Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3602426006 From bulasevich at openjdk.org Tue Dec 2 15:02:35 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 2 Dec 2025 15:02:35 GMT Subject: RFR: 8280283: Dead compiler code found during the JDK-8272058 code review In-Reply-To: References: Message-ID: <2DyhWZxKPAXQbCsjHhoSUQZ80Em0931LE2LRjLNRdHA=.cc61d9bd-fc90-40ea-88e9-ac76c21b5756@github.com> On Mon, 24 Nov 2025 09:26:13 GMT, Anton Seoane Ampudia wrote: > This PR removes some dead code that was found during review for [JDK-8272058](https://bugs.openjdk.org/browse/JDK-8272058). > > `target_addr_for_insn_or_null` is never run with a `ldrw` to `zr` (i.e. a safepoint poll). This is just a remnant from global safepointing, before we moved to using thread-local handshakes. No safepoint polling code reaches this function. More information can be read in the [original code review](https://github.com/openjdk/jdk18/pull/51#discussion_r774922087). Additionally, I have run tiers 1-6 to make sure this path did not exercise. > > This changeset also cleans up the unused `is_nop` function, following the comments in the issue. Other dead code mentioned there has since been long disappered. > > **Testing:** passes tiers 1-4 Nice cleanup. Cleaning up dead code always helps reduce technical debt. Are you sure there isn?t more to clean up? Have you tried building with GCC?s -Wunused options to catch additional unused symbols? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28473#issuecomment-3602485639 From mli at openjdk.org Tue Dec 2 15:16:59 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 2 Dec 2025 15:16:59 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Mon, 24 Nov 2025 03:11:07 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: > > modify label L_EXIT to L_exit src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2636: > 2634: void counterMode_AESCrypt(int round, Register in, Register out, Register key, Register counter, > 2635: Register input_len, Register saved_encrypted_ctr, Register used_ptr) { > 2636: // Algorithm: This should be my last comment :) Where is this "Algorithm" from? Can you put a link here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2581629384 From mli at openjdk.org Tue Dec 2 15:17:01 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 2 Dec 2025 15:17:01 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v26] In-Reply-To: References: <9oPWTWflnwws0wxHBP58IiQRIZz4Tt5bthr7RiC3BE0=.94d60901-8fad-4597-9e55-c669de73a8e6@github.com> Message-ID: On Thu, 20 Nov 2025 02:48:23 GMT, Anjian Wen wrote: >> There is a `mv` before exit of `generate_counterMode_AESCrypt`, is this one still necessary? > > Yes, about the `mv` before `generate_counterMode_AESCrypt`, it is for a different branch when input_len is zero at the first time. For the purpose to avoid additional jump, each code exit from `counterMode_AESCrypt` is a Independent exit, so I think we need to keep this `mv` here. I see, although looks a bit strange to me to return in this way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2581628935 From shade at openjdk.org Tue Dec 2 15:26:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 15:26:22 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v4] In-Reply-To: <-SdGoKVu9WpxzbLyqrLt7duH-qK_Bbm6ErrWdDfxJUg=.95c14f7f-b940-4dc6-a63d-055419625a36@github.com> References: <-SdGoKVu9WpxzbLyqrLt7duH-qK_Bbm6ErrWdDfxJUg=.95c14f7f-b940-4dc6-a63d-055419625a36@github.com> Message-ID: On Tue, 2 Dec 2025 08:09:30 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > bug number in test, comment Let's go then? I am eager to try and enable deeper CTW testing again :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3602590259 From vpaprotski at openjdk.org Tue Dec 2 15:30:43 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 2 Dec 2025 15:30:43 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory In-Reply-To: <3SFe0aKR8DW5SKjr375S78OWgJS7g2pLZfepb43yISI=.958eda85-ca1a-4f85-a9a2-c7ad60dcc025@github.com> References: <3SFe0aKR8DW5SKjr375S78OWgJS7g2pLZfepb43yISI=.958eda85-ca1a-4f85-a9a2-c7ad60dcc025@github.com> Message-ID: On Tue, 2 Dec 2025 14:39:15 GMT, Damon Fenacci wrote: >> Requires a Broadwell machine, but was able to reproduce with an emulator: >> >> >> ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi >> ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run > > src/hotspot/cpu/x86/stubDeclarations_x86.hpp line 76: > >> 74: do_arch_entry, \ >> 75: do_arch_entry_init) \ >> 76: do_arch_blob(compiler, 120000 WINDOWS_ONLY(+2000)) \ > > I was wondering if there are any reason for this value (apart that it is enough for the test to pass. I just noticed that it has been increased already in the past). The assert was suggesting 119k (and change..) so I rounded slightly up. I was going to ask (i.e. @TobiHartmann ?) if thats enough.. (Similarly, I am concerned that I am contributing to a larger JVM footprint, with my changes.. but I suppose 11k is comparatively insignificant in the grand scheme of things...) Thanks for the review! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28588#discussion_r2581685779 From epeter at openjdk.org Tue Dec 2 15:32:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:55 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 src/hotspot/share/opto/castnode.hpp line 108: > 106: // Floating: The Cast is only dependent on the single range check. > 107: // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely > 108: // remove the cast because the array access will be safe. The "Floating" part is a bit counter intuitive here, because the ctrl of the CastII is the RangeCheck, right? So is it not therefore already pinned? Maybe we can add some detail about what the "floating" explicitly means here. Is it that we can later move the CastII up in an optimization? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581630546 From epeter at openjdk.org Tue Dec 2 15:32:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:56 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> On Tue, 2 Dec 2025 15:14:28 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'master' into JDK-8354282 >> - whitespace >> - review >> - review >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java >> >> Co-authored-by: Christian Hagedorn >> - review >> - review >> - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 > > src/hotspot/share/opto/castnode.hpp line 108: > >> 106: // Floating: The Cast is only dependent on the single range check. >> 107: // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely >> 108: // remove the cast because the array access will be safe. > > The "Floating" part is a bit counter intuitive here, because the ctrl of the CastII is the RangeCheck, right? > So is it not therefore already pinned? > > Maybe we can add some detail about what the "floating" explicitly means here. Is it that we can later move the CastII up in an optimization? Actually, I'm wondering if the term `hoistable` and `non-hoistable` would not be better terms... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581642290 From epeter at openjdk.org Tue Dec 2 15:32:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:58 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> Message-ID: <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> On Tue, 2 Dec 2025 15:19:26 GMT, Emanuel Peter wrote: >> Actually, I'm wondering if the term `hoistable` and `non-hoistable` would not be better terms... > > At least we could say that it is allowed to hoist the RangeCheck, and the CastII could float up to where the RC is hoisted. Suggestion: // Use case example: Range Check CastII // Floating: The Cast is only dependent on the single range check. If the range check was ever to be hoisted // is would be safe to let the the Cast float to where the range check is hoisted up to. // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely // remove the cast because the array access will be safe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581692285 From epeter at openjdk.org Tue Dec 2 15:32:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 15:32:57 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> Message-ID: <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> On Tue, 2 Dec 2025 15:17:38 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/castnode.hpp line 108: >> >>> 106: // Floating: The Cast is only dependent on the single range check. >>> 107: // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely >>> 108: // remove the cast because the array access will be safe. >> >> The "Floating" part is a bit counter intuitive here, because the ctrl of the CastII is the RangeCheck, right? >> So is it not therefore already pinned? >> >> Maybe we can add some detail about what the "floating" explicitly means here. Is it that we can later move the CastII up in an optimization? > > Actually, I'm wondering if the term `hoistable` and `non-hoistable` would not be better terms... At least we could say that it is allowed to hoist the RangeCheck, and the CastII could float up to where the RC is hoisted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2581649395 From shade at openjdk.org Tue Dec 2 15:39:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 15:39:54 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v5] In-Reply-To: References: Message-ID: > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Enable more testing - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Update src/hotspot/share/compiler/compiler_globals.hpp Co-authored-by: Tobias Hartmann - Revert separate patch - Final - Proper option name and bump the limits - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26068/files - new: https://git.openjdk.org/jdk/pull/26068/files/f381a337..97975dd0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=03-04 Stats: 82234 lines in 1298 files changed: 53590 ins; 20514 del; 8130 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From duke at openjdk.org Tue Dec 2 15:40:49 2025 From: duke at openjdk.org (duke) Date: Tue, 2 Dec 2025 15:40:49 GMT Subject: Withdrawn: 8303762: Optimize vector slice operation with constant index using VPALIGNR instruction In-Reply-To: References: Message-ID: On Tue, 18 Mar 2025 20:51:46 GMT, Jatin Bhateja wrote: > Patch optimizes Vector. slice operation with constant index using x86 ALIGNR instruction. > It also adds a new hybrid call generator to facilitate lazy intrinsification or else perform procedural inlining to prevent call overhead and boxing penalties in case the fallback implementation expects to operate over vectors. The existing vector API-based slice implementation is now the fallback code that gets inlined in case intrinsification fails. > > Idea here is to add infrastructure support to enable intrinsification of fast path for selected vector APIs, else enable inlining of fall-back implementation if it's based on vector APIs. Existing call generators like PredictedCallGenerator, used to handle bi-morphic inlining, already make use of multiple call generators to handle hit/miss scenarios for a particular receiver type. The newly added hybrid call generator is lazy and called during incremental inlining optimization. It also relieves the inline expander to handle slow paths, which can easily be implemented library side (Java). > > Vector API jtreg tests pass at AVX level 2, remaining validation in progress. > > Performance numbers: > > > System : 13th Gen Intel(R) Core(TM) i3-1315U > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorSliceBenchmark.byteVectorSliceWithConstantIndex1 1024 thrpt 2 9444.444 ops/ms > VectorSliceBenchmark.byteVectorSliceWithConstantIndex2 1024 thrpt 2 10009.319 ops/ms > VectorSliceBenchmark.byteVectorSliceWithVariableIndex 1024 thrpt 2 9081.926 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex1 1024 thrpt 2 6085.825 ops/ms > VectorSliceBenchmark.intVectorSliceWithConstantIndex2 1024 thrpt 2 6505.378 ops/ms > VectorSliceBenchmark.intVectorSliceWithVariableIndex 1024 thrpt 2 6204.489 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex1 1024 thrpt 2 1651.334 ops/ms > VectorSliceBenchmark.longVectorSliceWithConstantIndex2 1024 thrpt 2 1642.784 ops/ms > VectorSliceBenchmark.longVectorSliceWithVariableIndex 1024 thrpt 2 1474.808 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex1 1024 thrpt 2 10399.394 ops/ms > VectorSliceBenchmark.shortVectorSliceWithConstantIndex2 1024 thrpt 2 10502.894 ops/ms > VectorSliceBenchmark.shortVectorSliceWithVariableIndex 1024 ... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24104 From vpaprotski at openjdk.org Tue Dec 2 15:40:51 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 2 Dec 2025 15:40:51 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory [v2] In-Reply-To: References: Message-ID: > Requires a Broadwell machine, but was able to reproduce with an emulator: > > > ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi > ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: comment from Manuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28588/files - new: https://git.openjdk.org/jdk/pull/28588/files/7870115c..c53924e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28588&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28588&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28588.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28588/head:pull/28588 PR: https://git.openjdk.org/jdk/pull/28588 From vpaprotski at openjdk.org Tue Dec 2 15:40:53 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 2 Dec 2025 15:40:53 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 08:55:40 GMT, Manuel H?ssig wrote: > https://github.com/openjdk/jdk/blob/84ffe87260753973835ea6b88443e28bcaf0122f/test/hotspot/jtreg/ProblemList.txt#L82 > > Meanwhile, I will run testing on our side and report back with the results. Done. Thanks for the tests @mhaessig let me know how it goes ------------- PR Comment: https://git.openjdk.org/jdk/pull/28588#issuecomment-3602651621 From duke at openjdk.org Tue Dec 2 15:41:07 2025 From: duke at openjdk.org (Zihao Lin) Date: Tue, 2 Dec 2025 15:41:07 GMT Subject: RFR: 8370196: C2: Improve (U)MulHiLNode::MulHiValue [v10] In-Reply-To: References: Message-ID: > If nodes both are constant, support constant folding. Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'openjdk:master' into JDK-8370196 - fix test failed - fix make unsigned - Merge branch 'master' into JDK-8370196 - Fix - Fix - Apply suggestion from @eme64 Co-authored-by: Emanuel Peter - Add Math to Operations.java - Add tests - Merge branch 'master' into JDK-8370196 - ... and 3 more: https://git.openjdk.org/jdk/compare/a62296d8...30fa1f03 ------------- Changes: https://git.openjdk.org/jdk/pull/28097/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28097&range=09 Stats: 373 lines in 8 files changed: 336 ins; 14 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/28097.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28097/head:pull/28097 PR: https://git.openjdk.org/jdk/pull/28097 From qamai at openjdk.org Tue Dec 2 15:46:54 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 15:46:54 GMT Subject: RFR: 8371964: C2 compilation asserts with "Unexpected load/store size" [v4] In-Reply-To: <-SdGoKVu9WpxzbLyqrLt7duH-qK_Bbm6ErrWdDfxJUg=.95c14f7f-b940-4dc6-a63d-055419625a36@github.com> References: <-SdGoKVu9WpxzbLyqrLt7duH-qK_Bbm6ErrWdDfxJUg=.95c14f7f-b940-4dc6-a63d-055419625a36@github.com> Message-ID: On Tue, 2 Dec 2025 08:09:30 GMT, Quan Anh Mai wrote: >> Hi, >> >> This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. >> >> To be more specific, for this issue, we have the graph that looks like: >> >> ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen >> >> with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: >> >> ConI -> ConvI2L -> VectorMaskGen >> >> After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. >> >> Please take a look and leave your thoughts, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > bug number in test, comment Thanks for the approval! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28410#issuecomment-3602683327 From qamai at openjdk.org Tue Dec 2 15:46:56 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 15:46:56 GMT Subject: Integrated: 8371964: C2 compilation asserts with "Unexpected load/store size" In-Reply-To: References: Message-ID: On Thu, 20 Nov 2025 08:42:46 GMT, Quan Anh Mai wrote: > Hi, > > This fixes the crash in `Load/StoreVectorMaskedNode::Ideal`. The issue here is that the graph is not canonical during idealization, which leads to us processing a dead node. The fix I propose is to bail-out when that happens. > > To be more specific, for this issue, we have the graph that looks like: > > ConI -> ConvI2L -> CastLL(0..32) -> VectorMaskGen > > with `ConI` being 45 and `MaxVectorSize` being 32. In this instance, `CastLL` is processed before `ConvI2L`, and when it is processed, it sees that the type of `ConvI2L` being its bottom type. As a result, it does not know that it is top, and since we are after macro expansion, which is after loop opts, the `CastLL` goes away, leaving us with: > > ConI -> ConvI2L -> VectorMaskGen > > After `ConvI2L` is processed, we know that the input of `VectorMaskGen` is a constant 45, which is larger than `MaxVectorSize`, leading to the assert failure. > > Please take a look and leave your thoughts, thanks a lot. This pull request has now been integrated. Changeset: ca4ae806 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/ca4ae8063edddda36fafafd06b9b1a88ffbf9d2e Stats: 23 lines in 2 files changed: 19 ins; 0 del; 4 mod 8371964: C2 compilation asserts with "Unexpected load/store size" Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/28410 From mhaessig at openjdk.org Tue Dec 2 15:48:07 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 2 Dec 2025 15:48:07 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v5] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 13:32:22 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into fix-c2-segfault-unlocknode > - address comments > - fix rename > - rename test file > - Merge branch 'master' into fix-c2-segfault-unlocknode > - fix test spacing > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - copyright format fix? > - 8370502: C2: segfault while adding node to IGVN worklist Thank you for addressing my comments. This looks good to me now. I will also run some testing on my side and report back with the results as soon as they are available. ------------- PR Review: https://git.openjdk.org/jdk/pull/28432#pullrequestreview-3530839782 From mhaessig at openjdk.org Tue Dec 2 15:54:00 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 2 Dec 2025 15:54:00 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 15:40:51 GMT, Volodymyr Paprotski wrote: >> Requires a Broadwell machine, but was able to reproduce with an emulator: >> >> >> ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi >> ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Manuel Thank you for addressing my comments. Testing passed up to tier6. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28588#pullrequestreview-3530874911 From shade at openjdk.org Tue Dec 2 16:04:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 16:04:12 GMT Subject: RFR: 8360557: CTW: Inline cold methods to reach more code [v6] In-Reply-To: References: Message-ID: <1o0uydFw-zaax7mVXkqJL15Cto0okbjQtkoqs6ADUyU=.001fda73-be80-41bd-9d2f-9258889117e3@github.com> > We use CTW testing for making sure compilers behave well. But we compile the code that is not executed at all, and since our inlining heuristics often looks back at profiles, we end up not actually inlining all too much! This means CTW testing likely misses lots of bugs that normal code is exposed to, especially e.g. in loop optimizations. > > There is an intrinsic tradeoff with accepting more inilned methods in CTW: the compilation time gets significantly worse. With just accepting the cold methods we have reasonable CTW times, eating the improvements we have committed in mainline recently. And it still finds bugs. See the RFE for sample data. > > After this lands and CTW starts to compile cold methods, one can greatly expand the scope of the CTW testing by overriding the static inlining limits. Doing e.g. `TEST_VM_OPTS="-XX:MaxInlineSize=70 -XX:C1MaxInlineSize=70"` finds even more bugs. Unfortunately, the compilation times suffer so much, they are impractical to run in standard configurations, see data in RFE. We will enable some of that testing in special testing pipelines. > > Pre-empting the question: "Well, why not use -Xcomp then, and make sure it inlines well?" The answer is in RFE as well: Xcomp causes _a lot_ of stray compilations for JDK and CTW infra itself. For small JARs in large corpus this eats precious testing time that we would instead like to spend on deeper inlining in the actual JAR code. This also does not force us to look into how CTW works in Xcomp at all; I expect some surprises there. Feather-touching the inlining heuristic paths to just accept methods without looking at profiles looks better. > > Tobias had an idea to implement the stress randomized inlining that would expand the scope of inlining. This improvement stacks well with it. This improvement provides the base case of inlining most reasonable methods, and then allow stress infra to inline some more on top of that. > > Additional testing: > - [x] GHA > - [x] Linux x86_64 server fastdebug, `applications/ctw/modules` > - [x] Linux x86_64 server fastdebug, large CTW corpus (now failing in interesting ways) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'master' into JDK-8360557-ctw-inlining - Enable more testing - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Merge branch 'master' into JDK-8360557-ctw-inlining - Update src/hotspot/share/compiler/compiler_globals.hpp Co-authored-by: Tobias Hartmann - Revert separate patch - Final - Proper option name and bump the limits - ... and 1 more: https://git.openjdk.org/jdk/compare/511a8fe5...2d02b713 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26068/files - new: https://git.openjdk.org/jdk/pull/26068/files/97975dd0..2d02b713 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26068&range=04-05 Stats: 1501 lines in 42 files changed: 888 ins; 231 del; 382 mod Patch: https://git.openjdk.org/jdk/pull/26068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26068/head:pull/26068 PR: https://git.openjdk.org/jdk/pull/26068 From dfenacci at openjdk.org Tue Dec 2 16:26:56 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 2 Dec 2025 16:26:56 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 15:40:51 GMT, Volodymyr Paprotski wrote: >> Requires a Broadwell machine, but was able to reproduce with an emulator: >> >> >> ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi >> ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Manuel Marked as reviewed by dfenacci (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28588#pullrequestreview-3531053225 From epeter at openjdk.org Tue Dec 2 16:52:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 16:52:30 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <9zey9SqquL1zLlFLuyKV_18OiZs2UQSokhREx9ln0l0=.edc15ede-e798-4d88-b61a-d2ed086d99da@github.com> On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 @rwestrel Nice work! We not just only fixed the bug but made the concepts much clearer. This makes me very happy ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24575#pullrequestreview-3531172652 From epeter at openjdk.org Tue Dec 2 16:52:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 16:52:32 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> Message-ID: On Tue, 2 Dec 2025 15:29:42 GMT, Emanuel Peter wrote: >> At least we could say that it is allowed to hoist the RangeCheck, and the CastII could float up to where the RC is hoisted. > > Suggestion: > > // Use case example: Range Check CastII > // Floating: The Cast is only dependent on the single range check. If the range check was ever to be hoisted > // is would be safe to let the the Cast float to where the range check is hoisted up to. > // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely > // remove the cast because the array access will be safe. Ok, I now read the PR from the top, and not just recent changes. If one were to start reading from the top, it would be clear without my suggestions here. But I think it could still be good to apply something about letting the Cast float to where we would hoist the RC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2582034834 From liach at openjdk.org Tue Dec 2 17:27:43 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 17:27:43 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:28:03 GMT, Per Minborg wrote: >> Chen Liang has updated the pull request incrementally with one additional commit since the last revision: >> >> Tweak VH usage in some classes > > src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2036: > >> 2034: var constant = MethodHandleImpl.isCompileConstant(vh); >> 2035: var cache = adaptedMh; >> 2036: if (constant == MethodHandleImpl.CONSTANT_YES && cache != null) { > > Rookie question: Is there multi-thread considerations here? How about visibility across threads? MethodHandle is immutable and can be safely published. So this is ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2582166754 From epeter at openjdk.org Tue Dec 2 17:38:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Dec 2025 17:38:59 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v3] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:46:05 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/phaseX.cpp line 2085: >> >>> 2083: } >>> 2084: return false; >>> 2085: } >> >> Why not call it `verify_node_invariants_for`? >> >> You should also assert immediately. @benoitmaillard Is about to make that change for everything: https://github.com/openjdk/jdk/pull/28295 > > That one is not integrated. Shouldn't I do that change only if it/when integrates? Right, keep it, just be informed, it may get integrated soon :) Renaming would still be good ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2582199486 From qamai at openjdk.org Tue Dec 2 17:48:43 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 17:48:43 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:20:41 GMT, Roland Westrelin wrote: >> This is a variant of 8332827. In 8332827, an array access becomes >> dependent on a range check `CastII` for another array access. When, >> after loop opts are over, that RC `CastII` was removed, the array >> access could float and an out of bound access happened. With the fix >> for 8332827, RC `CastII`s are no longer removed. >> >> With this one what happens is that some transformations applied after >> loop opts are over widen the type of the RC `CastII`. As a result, the >> type of the RC `CastII` is no longer narrower than that of its input, >> the `CastII` is removed and the dependency is lost. >> >> There are 2 transformations that cause this to happen: >> >> - after loop opts are over, the type of the `CastII` nodes are widen >> so nodes that have the same inputs but a slightly different type can >> common. >> >> - When pushing a `CastII` through an `Add`, if of the type both inputs >> of the `Add`s are non constant, then we end up widening the type >> (the resulting `Add` has a type that's wider than that of the >> initial `CastII`). >> >> There are already 3 types of `Cast` nodes depending on the >> optimizations that are allowed. Either the `Cast` is floating >> (`depends_only_test()` returns `true`) or pinned. Either the `Cast` >> can be removed if it no longer narrows the type of its input or >> not. We already have variants of the `CastII`: >> >> - if the Cast can float and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and be removed when it doesn't narrow the type >> of its input. >> >> - if the Cast is pinned and can't be removed when it doesn't narrow >> the type of its input. >> >> What we need here, I think, is the 4th combination: >> >> - if the Cast can float and can't be removed when it doesn't narrow >> the type of its input. >> >> Anyway, things are becoming confusing with all these different >> variants named in ways that don't always help figure out what >> constraints one of them operate under. So I refactored this and that's >> the biggest part of this change. The fix consists in marking `Cast` >> nodes when their type is widen in a way that prevents them from being >> optimized out. >> >> Tobias ran performance testing with a slightly different version of >> this change and there was no regression. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into JDK-8354282 > - whitespace > - review > - review > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java > > Co-authored-by: Christian Hagedorn > - review > - review > - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 src/hotspot/share/opto/castnode.hpp line 105: > 103: // All the possible combinations of floating/narrowing with example use cases: > 104: > 105: // Use case example: Range Check CastII I believe this is incorrect, a range check should be floating non-narrowing. It is only narrowing if the length of the array is a constant. It is because this cast encodes the dependency on the condition `index u< length`. This condition cannot be expressed in terms of `Type` unless `length` is a constant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2582188782 From qamai at openjdk.org Tue Dec 2 17:48:44 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 Dec 2025 17:48:44 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> Message-ID: On Tue, 2 Dec 2025 16:48:55 GMT, Emanuel Peter wrote: >> Suggestion: >> >> // Use case example: Range Check CastII >> // Floating: The Cast is only dependent on the single range check. If the range check was ever to be hoisted >> // it would be safe to let the the Cast float to where the range check is hoisted up to. >> // Narrowing: The Cast narrows the type to a positive index. If the input to the Cast is narrower, we can safely >> // remove the cast because the array access will be safe. > > Ok, I now read the PR from the top, and not just recent changes. If one were to start reading from the top, it would be clear without my suggestions here. But I think it could still be good to apply something about letting the Cast float to where we would hoist the RC. Naming is hard, but it is worth pointing out in the comment that floating here refers to `depends_only_on_test`. In other words, a cast is considered floating if it is legal to change the control input of a cast from an `IfTrue` or `IfFalse` to an `IfTrue` and `IfFalse` that dominates the current control input, and the corresponding conditions of the `If`s are the same. In contrast, we cannot do that for a pinned cast, and if the control is folded away, the control input of the pinned cast is changed to the control predecessor of the folded node. It is also worth noting that we have `Node::pinned` which means the node is pinned AT the control input while pinned here means that it is pinned UNDER the control input. Very confusing! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2582215477 From dlong at openjdk.org Tue Dec 2 18:22:33 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 2 Dec 2025 18:22:33 GMT Subject: Integrated: 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack In-Reply-To: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> References: <5JAu6StX5-r2itXPGiDBgGHjGo0S2mOfGxOpPoMSkIQ=.000500da-a003-403b-9d3b-6df3a53c2b22@github.com> Message-ID: On Tue, 25 Nov 2025 03:25:05 GMT, Dean Long wrote: > The problem is C2 is throwing an exception and then deoptimizing, and the -XX:+VerifyStack logic expects the stack to be empty, match the "before" state if the reexecute flag is set, or match the "after" state. C2 is using the "before" state, so for correctness it also needs to set the reexecute flag. > > I played around with other approaches, like: > 1. setting the stack to empty > 2. adding all the bytecodes that can throw to the list in AbstractInterpreter::bytecode_should_reexecute() > 3. always setting the reexecute flag in add_safepoint_edges() if must_throw is set > but in the end I decided to go with the minimal localized low-risk change. This pull request has now been integrated. Changeset: 5627ff2d Author: Dean Long URL: https://git.openjdk.org/jdk/commit/5627ff2d9165ee1f7354c1ff1626f4949ef7fa3f Stats: 15 lines in 2 files changed: 8 ins; 1 del; 6 mod 8370766: JVM crashes when running compiler/exceptions/TestAccessErrorInCatch.java fails with -XX:+VerifyStack Co-authored-by: Manuel H?ssig Reviewed-by: mhaessig, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/28486 From dlong at openjdk.org Tue Dec 2 18:41:44 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 2 Dec 2025 18:41:44 GMT Subject: RFR: 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 In-Reply-To: References: Message-ID: <1CdpBw0mdYmQtGrr73r8FYkWk3BDdcRs8lTbixw3Sd0=.c0aee48c-91a0-422f-8c00-46d5f9b705d6@github.com> On Tue, 2 Dec 2025 10:44:24 GMT, Aleksey Shipilev wrote: > I just noticed (while looking at [JDK-8372800](https://bugs.openjdk.org/browse/JDK-8372800)) that I made a little error in [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) refactor, which made GetAndSet-acquire instruction cost twice as high. The usual cost for acquire versions are twice as low, likely to be selected instead of non-acquire versions. > > This bug happened as I "simplified" stencils at some point by dropping some arguments and renumbering the remaining ones. This is one place where I apparently forgot to renumber one usage. See other checks for `ifelse($3,Acq,...` in that stencil, all of them are `$3` (correct), not `$4` (incorrect). Seen no real bugs because of this mishap, but it would be good to fix it in case we see issues later. I also looked at stencils again, and I think there are no other argument-index problems like this anywhere else. > > The real change is in `aarch64_atomic_ad.m4`, `.ad` is re-generated from that stencil. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, quick jcstress run Marked as reviewed by dlong (Reviewer). I kicked off Oracle testing. I'm tempted to say this is trivial, reverting the costs to what they were before, but a 2nd review wouldn't hurt. I think the reason it didn't cause a regression is because in case of ties, the later acquire rule is still the first candidate. ------------- PR Review: https://git.openjdk.org/jdk/pull/28598#pullrequestreview-3531635921 PR Comment: https://git.openjdk.org/jdk/pull/28598#issuecomment-3603461868 From valeriep at openjdk.org Tue Dec 2 18:56:52 2025 From: valeriep at openjdk.org (Valerie Peng) Date: Tue, 2 Dec 2025 18:56:52 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v7] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 10:27:36 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Minor simplification. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Fix missing whitespace. > - Address review comments. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Remove K from AES_Crypt > - More minor cleanup. > - Improve comment and minor cleanup. > - 8371820: Further AES performance improvements for key schedule generation Marked as reviewed by valeriep (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28299#pullrequestreview-3531689005 From mdoerr at openjdk.org Tue Dec 2 19:36:58 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 Dec 2025 19:36:58 GMT Subject: RFR: 8371820: Further AES performance improvements for key schedule generation [v7] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 10:27:36 GMT, Martin Doerr wrote: >> This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. >> >> The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Minor simplification. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Fix missing whitespace. > - Address review comments. > - Merge remote-tracking branch 'origin' into 8371820_AES_Crypt > - Remove K from AES_Crypt > - More minor cleanup. > - Improve comment and minor cleanup. > - 8371820: Further AES performance improvements for key schedule generation Thanks for all reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28299#issuecomment-3603660124 From mdoerr at openjdk.org Tue Dec 2 19:40:17 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 Dec 2025 19:40:17 GMT Subject: Integrated: 8371820: Further AES performance improvements for key schedule generation In-Reply-To: References: Message-ID: On Thu, 13 Nov 2025 16:48:28 GMT, Martin Doerr wrote: > This fix simplifies the hotspot intrinsics for some platforms and optimizes the key computation for encryption. We can save the `genInvRoundKeys` computation when we only do encryption. > > The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 17% for ppc64 and 26% for x86_64. This pull request has now been integrated. Changeset: 618732ff Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/618732ffc04ef393c9b8a3265c12ba66f31784d9 Stats: 61 lines in 7 files changed: 13 ins; 8 del; 40 mod 8371820: Further AES performance improvements for key schedule generation Reviewed-by: rrich, valeriep ------------- PR: https://git.openjdk.org/jdk/pull/28299 From vlivanov at openjdk.org Tue Dec 2 20:21:29 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 20:21:29 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> Message-ID: <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0rbfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> On Tue, 2 Dec 2025 02:51:57 GMT, Chen Liang wrote: >> So, it seems like what you are trying to achieve is a 1-1 mapping from `AccessDescriptor` to `vh` through `adaptedMh`. So, once `cache != null` you can trust that it corresponds to the `vh` instance passed as a constant. But cache pollution can easily break the invariant, so you try to eliminate the pollution by avoiding cache updates when vh is not constant. Do I get it right? > > No. The avoidance of cache update simply trims down the generated code by throwing away the meaningless cache update. > > The access to cache is already safeguarded by `constant == MethodHandleImpl.CONSTANT_YES`. I should have moved `var cache = adaptedMh;` into the if block of `constant == CONSTANT_YES`. I still find it confusing, especially tri-state logic part. For background, `isCompileConstant` was introduced as part of LF sharing effort to get rid of Java-level profiling in optimized code. The pattern is was designed for was: if (isCompileConstant(...)) { return ...; } else { ... // do some extra work (either in interpreter, C1, or not-fully-optimized version in C2) } In this patch, you don't follow that pattern and aadd new state (`CONSTANT_PENDING`) to distinguish interpreter/C1 from C2. What's the motivation? Why do you want to avoid cache updates coming from C2-generated code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2582647097 From vlivanov at openjdk.org Tue Dec 2 20:21:33 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 20:21:33 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:41:04 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Tweak VH usage in some classes src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2042: > 2040: // This is still a hot path if vh is not constant - in this case, > 2041: // asType is the bottleneck for constant folding, unfortunately > 2042: var result = vh.getMethodHandle(mode).asType(symbolicMethodTypeInvoker); `mode` and `symbolicMethodTypeInvoker` are part of `AccessDescriptor` while `vh` comes as an argument. What guarantees that a cached adapter is compatible with `vh` observed during subsequent calls? It means that `vh` shape stays exactly the same shape. Is it correct? Would be good to have it validated with asserts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2582661917 From shade at openjdk.org Tue Dec 2 20:33:19 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Dec 2025 20:33:19 GMT Subject: RFR: 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:44:24 GMT, Aleksey Shipilev wrote: > I just noticed (while looking at [JDK-8372800](https://bugs.openjdk.org/browse/JDK-8372800)) that I made a little error in [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) refactor, which made GetAndSet-acquire instruction cost twice as high. The usual cost for acquire versions are twice as low, likely to be selected instead of non-acquire versions. > > This bug happened as I "simplified" stencils at some point by dropping some arguments and renumbering the remaining ones. This is one place where I apparently forgot to renumber one usage. See other checks for `ifelse($3,Acq,...` in that stencil, all of them are `$3` (correct), not `$4` (incorrect). Seen no real bugs because of this mishap, but it would be good to fix it in case we see issues later. I also looked at stencils again, and I think there are no other argument-index problems like this anywhere else. > > The real change is in `aarch64_atomic_ad.m4`, `.ad` is re-generated from that stencil. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, quick jcstress run Thanks Dean! jcstress run comes back clean as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28598#issuecomment-3603848986 From vlivanov at openjdk.org Tue Dec 2 20:39:55 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 20:39:55 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 02:49:52 GMT, Chen Liang wrote: > would it be safer for us to move the constant detection after generate_virtual_guard in the is_virtual if block? Good catch. I missed that the intrinsic is shared between `System::identityHashCode()` and `Object::hashCode`. I'm not sure it makes sense to support `Object::hashCode` unless C2 can eliminate `generate_virtual_guard` for a constant receiver. I'd just limit constant folding to `!is_virtual` case for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3603869560 From vlivanov at openjdk.org Tue Dec 2 20:45:38 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 20:45:38 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 06:42:32 GMT, Tobias Hartmann wrote: > Just an observation: This patch will only allow folding during parsing. I would expect that often, opportunities only arise after other optimizations already took place. I deliberately omitted post-parse optimization opportunities for now. It would require a gradual lowering of the representation from a high-level macro node to low-level poking at object header. Moreover, final representation has complex control, so either the macro node should be a CFG node or a way to determine a location in CFG for a data-only macro node and expanding it there needs to be supported. (There are other use cases for such functionality, like lowering data nodes into pure calls, but no readily available implementation is there yet.) IMO something to work on in a follow-up enhancement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3603886860 From vlivanov at openjdk.org Tue Dec 2 20:46:36 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 20:46:36 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> References: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> Message-ID: On Mon, 1 Dec 2025 19:30:22 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Test fix Any reviews, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/28517#issuecomment-3603891338 From vlivanov at openjdk.org Tue Dec 2 20:56:18 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 20:56:18 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 20:36:50 GMT, Vladimir Ivanov wrote: > I'm not sure it makes sense to support Object::hashCode unless C2 can eliminate generate_virtual_guard for a constant receiver. I'd just limit constant folding to !is_virtual case for now. Or, alternatively, inspect constant object's v-table during compilation and ensure that corresponding slot points at `Object::hashCode`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3603919570 From kvn at openjdk.org Tue Dec 2 21:00:58 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Dec 2025 21:00:58 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:01:08 GMT, Chen Liang wrote: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. Good. Yes, we can work on constant folding in IGVN later. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28589#pullrequestreview-3532076661 From kvn at openjdk.org Tue Dec 2 21:01:00 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Dec 2025 21:01:00 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 20:53:56 GMT, Vladimir Ivanov wrote: >>> would it be safer for us to move the constant detection after generate_virtual_guard in the is_virtual if block? >> >> Good catch. I missed that the intrinsic is shared between `System::identityHashCode()` and `Object::hashCode`. >> >> I'm not sure it makes sense to support `Object::hashCode` unless C2 can eliminate `generate_virtual_guard` for a constant receiver. I'd just limit constant folding to `!is_virtual` case for now. > >> I'm not sure it makes sense to support Object::hashCode unless C2 can eliminate generate_virtual_guard for a constant receiver. I'd just limit constant folding to !is_virtual case for now. > > Or, alternatively, inspect constant object's v-table during compilation and ensure that corresponding slot points at `Object::hashCode`. @iwanowww please fix title to match JBS. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3603933760 From liach at openjdk.org Tue Dec 2 22:06:49 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 22:06:49 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 20:18:19 GMT, Vladimir Ivanov wrote: >> Chen Liang has updated the pull request incrementally with one additional commit since the last revision: >> >> Tweak VH usage in some classes > > src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2042: > >> 2040: // This is still a hot path if vh is not constant - in this case, >> 2041: // asType is the bottleneck for constant folding, unfortunately >> 2042: var result = vh.getMethodHandle(mode).asType(symbolicMethodTypeInvoker); > > `mode` and `symbolicMethodTypeInvoker` are part of `AccessDescriptor` while `vh` comes as an argument. What guarantees that a cached adapter is compatible with `vh` observed during subsequent calls? It means that `vh` shape stays exactly the same shape. Is it correct? Would be good to have it validated with asserts. I am assuming that the previous `vh` observed is compatible with future ones if compiler can fold the `vh` into a constant. If it is not, we can drop the updates to the cache field in the C2 compiled slow code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2582922472 From liach at openjdk.org Tue Dec 2 22:10:31 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 22:10:31 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0rbfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0r bfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> Message-ID: On Tue, 2 Dec 2025 20:12:12 GMT, Vladimir Ivanov wrote: >> No. The avoidance of cache update simply trims down the generated code by throwing away the meaningless cache update. >> >> The access to cache is already safeguarded by `constant == MethodHandleImpl.CONSTANT_YES`. I should have moved `var cache = adaptedMh;` into the if block of `constant == CONSTANT_YES`. > > I still find it confusing, especially tri-state logic part. > > For background, `isCompileConstant` was introduced as part of LF sharing effort to get rid of Java-level profiling in optimized code. The pattern is was designed for was: > > if (isCompileConstant(...)) { > return ...; > } else { > ... // do some extra work (either in interpreter, C1, or not-fully-optimized version in C2) > } > > > In this patch, you don't follow that pattern and aadd new state (`CONSTANT_PENDING`) to distinguish interpreter/C1 from C2. What's the motivation? Why do you want to avoid cache updates coming from C2-generated code? I am assuming that if C2 determines this `vh` is not a constant, we can drop it. Is that a right way to move along, or could C2 transition from "not a constant" to "is a constant" during the phases? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2582931449 From vlivanov at openjdk.org Tue Dec 2 22:28:01 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 2 Dec 2025 22:28:01 GMT Subject: RFR: 8372845: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 20:58:23 GMT, Vladimir Kozlov wrote: >>> I'm not sure it makes sense to support Object::hashCode unless C2 can eliminate generate_virtual_guard for a constant receiver. I'd just limit constant folding to !is_virtual case for now. >> >> Or, alternatively, inspect constant object's v-table during compilation and ensure that corresponding slot points at `Object::hashCode`. > > @iwanowww please fix title to match JBS. @vnkozlov I can't since I'm not the author of the PR :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3604211003 From liach at openjdk.org Tue Dec 2 23:18:26 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 23:18:26 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v2] In-Reply-To: References: Message-ID: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into fix/identity-hash-const - Cleanup - identity hash support in C2 - Move around - Constant fold identity hash ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28589/files - new: https://git.openjdk.org/jdk/pull/28589/files/4a82f79d..69225241 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=00-01 Stats: 4382 lines in 99 files changed: 2947 ins; 684 del; 751 mod Patch: https://git.openjdk.org/jdk/pull/28589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28589/head:pull/28589 PR: https://git.openjdk.org/jdk/pull/28589 From liach at openjdk.org Tue Dec 2 23:18:28 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 23:18:28 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 23:01:08 GMT, Chen Liang wrote: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. I tried to come up with an example where the buggy code from Vladimir would inline to identityHashCode when the right call would be virtual - couldn't construct such a case unfortunately :( I think we can deal with IGVN later, as this involves creating new macro node and other infrastructure support. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28589#issuecomment-3604321567 From liach at openjdk.org Tue Dec 2 23:20:12 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 23:20:12 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v4] In-Reply-To: References: Message-ID: > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Rollback getAndAdd for now - Redundant change - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache - Stage - Review tweaks - Tweak VH usage in some classes - Logical fallacy - 8160821 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/7bcdcbf3..d49ad129 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=02-03 Stats: 4382 lines in 98 files changed: 2923 ins; 688 del; 771 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From liach at openjdk.org Tue Dec 2 23:25:29 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 23:25:29 GMT Subject: RFR: 8372845: C2: Fold identity hash code if object is constant [v3] In-Reply-To: References: Message-ID: > Folding identity hash as constant if the incoming argument is constant would be useful for quick map lookups, such as for the [Classifier proposal](https://openjdk.org/jeps/8357674). Currently, identity hash is not constant because it loads the object header/mark word. We can add an explicit bypass to load an existing hash eagerly instead. Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28589/files - new: https://git.openjdk.org/jdk/pull/28589/files/69225241..b1d8be39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28589&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28589/head:pull/28589 PR: https://git.openjdk.org/jdk/pull/28589 From liach at openjdk.org Tue Dec 2 23:30:13 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 2 Dec 2025 23:30:13 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v4] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 23:20:12 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Rollback getAndAdd for now > - Redundant change > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Stage > - Review tweaks > - Tweak VH usage in some classes > - Logical fallacy > - 8160821 After consulting with @iwanowww, I realized the non-constant status cannot be determined, that the C2 compiled method can even transition from 0 to 1, so I am simplifying this code to only handle the constant case. It seems the getAndAdd IR test no longer fails with this change, and I removed a lot of other redundant changes. I updated the VarHandleExact benchmark added by @JornVernee, and added a case of dropping return values by changing access mode to `getAndAdd` consistently. Now they have the following performance numbers: Benchmark Mode Cnt Score Error Units VarHandleExact.exact_exactInvocation avgt 30 3.843 ? 0.062 ns/op VarHandleExact.generic_exactInvocation avgt 30 3.797 ? 0.049 ns/op VarHandleExact.generic_genericInvocation avgt 30 3.757 ? 0.034 ns/op VarHandleExact.generic_returnDroppingInvocation avgt 30 3.754 ? 0.026 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/28585#issuecomment-3604377750 From dlong at openjdk.org Wed Dec 3 00:07:08 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 3 Dec 2025 00:07:08 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v5] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 13:32:22 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into fix-c2-segfault-unlocknode > - address comments > - fix rename > - rename test file > - Merge branch 'master' into fix-c2-segfault-unlocknode > - fix test spacing > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - copyright format fix? > - 8370502: C2: segfault while adding node to IGVN worklist Yes, it would be good to know if expand_lock_node() also needs a null check. I was assuming the lock and unlock node shapes were basically the same, but now I see that the shapes are different for some reason. The LockNode gets a FastLockNode edge early, while the UnlockNode creates its FastUnlockNode late. I failed to get expand_lock_node() to crash with -XX:+StressMacroExpansion but that doesn't mean there isn't the same problem there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3604455582 From vlivanov at openjdk.org Wed Dec 3 01:42:45 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 3 Dec 2025 01:42:45 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0r bfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> Message-ID: <5CADH75ZjadKttOKwsykRFUPlQKLiwCW8E5WkM_75a4=.fd992c8f-e8bc-4775-9ea3-d5212664e3df@github.com> On Tue, 2 Dec 2025 22:08:20 GMT, Chen Liang wrote: >> I still find it confusing, especially tri-state logic part. >> >> For background, `isCompileConstant` was introduced as part of LF sharing effort to get rid of Java-level profiling in optimized code. The pattern is was designed for was: >> >> if (isCompileConstant(...)) { >> return ...; >> } else { >> ... // do some extra work (either in interpreter, C1, or not-fully-optimized version in C2) >> } >> >> >> In this patch, you don't follow that pattern and aadd new state (`CONSTANT_PENDING`) to distinguish interpreter/C1 from C2. What's the motivation? Why do you want to avoid cache updates coming from C2-generated code? > > I am assuming that if C2 determines this `vh` is not a constant, we can drop it. Is that a right way to move along, or could C2 transition from "not a constant" to "is a constant" during the phases? Sorry, I still don't understand how it is intended to work. Why does `MethodHandleImpl.isCompileConstant(vh) == true` imply that the cached value is compatible with the constant `vh`? // Keep capturing - vh may suddenly get promoted to a constant by C2 Capturing happens outside compiler thread. It is not affected by C2 (except when it completely prunes the whole block). So, either any captured adaptation is valid/compatible or there's a concurrency issue when C2 kicks in and there's a concurrent cache update happening with incompatible version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2583346750 From vlivanov at openjdk.org Wed Dec 3 01:56:27 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 3 Dec 2025 01:56:27 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v4] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 23:20:12 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Rollback getAndAdd for now > - Redundant change > - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache > - Stage > - Review tweaks > - Tweak VH usage in some classes > - Logical fallacy > - 8160821 test/micro/org/openjdk/bench/java/lang/invoke/VarHandleExact.java line 81: > 79: > 80: @Benchmark > 81: public void generic_returnDroppingInvocation() { What about "all-generic" case (` { generic.getAndAdd(data, 42); }`)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2583363907 From dlong at openjdk.org Wed Dec 3 02:22:40 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 3 Dec 2025 02:22:40 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> References: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> Message-ID: On Mon, 1 Dec 2025 19:30:22 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Test fix Looks reasonable, but I'm not an expert in this area. src/hotspot/share/opto/parse2.cpp line 1737: > 1735: (*cast_type) = tcon->isa_klassptr()->as_instance_type(); > 1736: return true; // found > 1737: } The old code checked klass_is_exact() for this case, but the new code does not, so was it redundant, given we have a constant? ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28517#pullrequestreview-3532891901 PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2583402219 From dlong at openjdk.org Wed Dec 3 02:37:55 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 3 Dec 2025 02:37:55 GMT Subject: RFR: 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:44:24 GMT, Aleksey Shipilev wrote: > I just noticed (while looking at [JDK-8372800](https://bugs.openjdk.org/browse/JDK-8372800)) that I made a little error in [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) refactor, which made GetAndSet-acquire instruction cost twice as high. The usual cost for acquire versions are twice as low, likely to be selected instead of non-acquire versions. > > This bug happened as I "simplified" stencils at some point by dropping some arguments and renumbering the remaining ones. This is one place where I apparently forgot to renumber one usage. See other checks for `ifelse($3,Acq,...` in that stencil, all of them are `$3` (correct), not `$4` (incorrect). Seen no real bugs because of this mishap, but it would be good to fix it in case we see issues later. I also looked at stencils again, and I think there are no other argument-index problems like this anywhere else. > > The real change is in `aarch64_atomic_ad.m4`, `.ad` is re-generated from that stencil. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, quick jcstress run Oracle testing results look clean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28598#issuecomment-3604789309 From wenanjian at openjdk.org Wed Dec 3 03:23:01 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 3 Dec 2025 03:23:01 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 15:14:09 GMT, Hamlin Li wrote: >> Anjian Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> modify label L_EXIT to L_exit > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2636: > >> 2634: void counterMode_AESCrypt(int round, Register in, Register out, Register key, Register counter, >> 2635: Register input_len, Register saved_encrypted_ctr, Register used_ptr) { >> 2636: // Algorithm: > > This should be my last comment :) > Where is this "Algorithm" from? Can you put a link here? Oh sure, when implementing the Algorithm, I mainly referred to the Java code implementation (https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java#L200-L212). besides, I referred to the aarch64 implementation (https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L3190), and made some modifications for RISC-V instructions ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2583489697 From dlong at openjdk.org Wed Dec 3 03:38:59 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 3 Dec 2025 03:38:59 GMT Subject: RFR: 8350208: CTW: GraphKit::add_safepoint_edges asserts "not enough operands for reexecution" In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:30:46 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the issue of the compiler crashing with "not enough operands for reexecution". The issue here is that during `Parse::catch_inline_exceptions`, the old stack is gone, and we cannot reexecute the current bytecode anymore. However, there are some places where we try to insert safepoints into the graph, such as if the handler is a backward jump, or if one of the exceptions in the handlers is not loaded. Since the `_reexecute` state of the current jvms is "undefined", it is inferred automatically that it should reexecute for some bytecodes such as `putfield`. The solution then is to explicitly set `_reexecute` to false. > > I can manage to write a unit test for the case of a backward handler, for the other cases, since the exceptions that can be thrown for a bytecode that is inferred to reexecute are `NullPointerException`, `ArrayIndexOutOfBoundsException`, and `ArrayStoreException`. I find it hard to construct such a test in which one of them is not loaded. > > Please kindly review, thanks a lot. src/hotspot/share/opto/doCall.cpp line 958: > 956: ex_node = use_exception_state(ex_map); > 957: // The stack from before the throwing bytecode is gone, cannot reexecute here > 958: jvms()->set_should_reexecute(false); I agree there are situations where we need to set the reexecute flag explicitly and not base it on the bytecode. I recently fixed JDK-8370766 and filed JDK-8372846 as a followup for similar issues. I need to try out your test to understand this better. Does it cause a backwards-branch safepoint? I suspect that it may not be safe to set rexeecute to false here. If reexecute is false and -XX:+VerifyStack is set, deoptimization may fail if the operands are not on the stack. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28597#discussion_r2583510384 From wenanjian at openjdk.org Wed Dec 3 03:44:27 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 3 Dec 2025 03:44:27 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v30] In-Reply-To: References: Message-ID: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: - Merge branch 'openjdk:master' into aes_ctr - modify label L_EXIT to L_exit - add more comments for key value 52 - update some comments, names and Pseudocode - modify stub_id name - Merge branch 'openjdk:master' into aes_ctr - modify format - add more comments - modify parm to unsigned as aarch64 and x86 - clean comments and format - ... and 21 more: https://git.openjdk.org/jdk/compare/530493fe...98d802d5 ------------- Changes: https://git.openjdk.org/jdk/pull/25281/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25281&range=29 Stats: 239 lines in 2 files changed: 230 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25281/head:pull/25281 PR: https://git.openjdk.org/jdk/pull/25281 From liach at openjdk.org Wed Dec 3 04:13:55 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Dec 2025 04:13:55 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <5CADH75ZjadKttOKwsykRFUPlQKLiwCW8E5WkM_75a4=.fd992c8f-e8bc-4775-9ea3-d5212664e3df@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0r bfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> <5CADH75ZjadKttOKwsykRFUPlQKLiwCW8E5WkM_75a4=.fd992c8f-e8bc-4775-9ea3-d5212664e3df@github.com> Message-ID: On Wed, 3 Dec 2025 01:40:29 GMT, Vladimir Ivanov wrote: > any captured adaptation is valid/compatible Yes, if `vh` is a constant, any captured adaptation from `vh.getMethodHandle(mode).asType(symbolicMethodTypeInvoker)` is valid/compatible. For thread safety, MethodHandle supports safe publication, so I think we are fine publishing this way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2583556067 From liach at openjdk.org Wed Dec 3 04:13:59 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Dec 2025 04:13:59 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v4] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 01:53:13 GMT, Vladimir Ivanov wrote: >> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Rollback getAndAdd for now >> - Redundant change >> - Merge branch 'master' of https://github.com/openjdk/jdk into fix/vh-adapt-cache >> - Stage >> - Review tweaks >> - Tweak VH usage in some classes >> - Logical fallacy >> - 8160821 > > test/micro/org/openjdk/bench/java/lang/invoke/VarHandleExact.java line 81: > >> 79: >> 80: @Benchmark >> 81: public void generic_returnDroppingInvocation() { > > What about "all-generic" case (` { generic.getAndAdd(data, 42); }`)? I can change the `generic_genericInvocation` to an all-generic case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2583556794 From epeter at openjdk.org Wed Dec 3 05:46:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Dec 2025 05:46:05 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v4] In-Reply-To: References: Message-ID: <9mnRXpB16Y6Mw0TSGFJz-69m24lzCNPMC_B1_YseD4M=.be94bbba-88ce-4958-a8bd-89862d7ec2e7@github.com> On Tue, 2 Dec 2025 09:49:29 GMT, Roland Westrelin wrote: >> The test case has an out of loop `Store` with an `AddP` address >> expression that has other uses and is in the loop body. Schematically, >> only showing the address subgraph and the bases for the `AddP`s: >> >> >> Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> CastPP#110 >> >> >> Both `AddP`s have the same base, a `CastPP` that's also in the loop >> body. >> >> That loop is a counted loop and only has 3 iterations so is fully >> unrolled. First, one iteration is peeled: >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> The `AddP`s and `CastPP` are cloned (because in the loop body). As >> part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is >> called. It finds the test that guards `CastPP#283` in the peeled >> iteration dominates and replaces the test that guards `CastPP#110` >> (the test in the peeled iteration is the clone of the test in the >> loop). That causes `CastPP#110`'s control to be updated to that of the >> test in the peeled iteration and to be yanked from the loop. So now >> `CastPP#283` and `CastPP#110` have the same inputs. >> >> Next unrolling happens: >> >> >> /-> CastPP#110 >> /-> AddP#400 -> AddP#401 -> CastPP#110 >> Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 >> \ -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> `AddP`s are cloned once more but not the `CastPP`s because they are >> both in the peeled iteration now. A new `Phi` is added. >> >> Next igvn runs. It's going to push the `AddP`s through the `Phi`s. >> >> Through `Phi#477`: >> >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 >> \ -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> Through `Phi#360`: >> >> >> /-> AddP#134 -> CastPP#110 >> /-> Phi#509 -> AddP#401 -> CastPP#110 >> Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 >> -> Phi#514 -> CastPP#283 >> ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - more > - review > - Merge branch 'master' into JDK-8351889 > - exp > - Merge branch 'master' into JDK-8351889 > - verif > - Merge branch 'master' into JDK-8351889 > - test seed > - more > - Merge branch 'master' into JDK-8351889 > - ... and 4 more: https://git.openjdk.org/jdk/compare/d6d17aab...15c17bb1 I think I'm on board with the solution now. It is probably best to do it during IGVN. I have a few more suggestions below :) src/hotspot/share/opto/cfgnode.cpp line 2171: > 2169: !wait_for_region_igvn(phase)) { > 2170: // If one of the inputs is a cast that has yet to be processed by igvn, delay processing of this node to give the > 2171: // inputs a chance to optimize and possibly end up with identical inputs. I think we should have more detail here. Why is this a good idea? Is this an optimization? Or is it for correctness? I think you should say something about possibly having multiple cast nodes that could be commoned, and then they would keep their ctrl. But if we uncast, then we lose the info about the ctrl, and below we insert a new cast with a different (later) ctrl. This has two downsides: - The ctrl is later than necessary: suboptimal - If we have 3 or more copies of casts with the same ctrl, and now we remove two and create a new one with a different ctrl, then the remaining old and the new cast cannot common because they have different ctrl. - this suboptimal - this also creates issues along AddP paths: it can be that at some AddP we get one cast and at another AddP a different cast. They all come from the same original base address, just casted differently. But it makes it difficult to check consistency, and asserts fail. This is not very concise yet, you can probably formulate it in a better way ;) src/hotspot/share/opto/phaseX.cpp line 2076: > 2074: if (addp->in(AddPNode::Base) == n->in(AddPNode::Base)) { > 2075: return false; > 2076: } Suggestion: if (!addp->is_AddP() || addp->in(AddPNode::Base)->is_top() || addp->in(AddPNode::Base) == n->in(AddPNode::Base)) { return false; } test/hotspot/jtreg/compiler/c2/TestMismatchedAddPAfterMaxUnroll.java line 35: > 33: * -XX:+StressIGVN TestMismatchedAddPAfterMaxUnroll > 34: * @run main/othervm TestMismatchedAddPAfterMaxUnroll > 35: */ What about a run with our new fancy flag `-XX:VerifyIterativeGVN=10000`? ------------- PR Review: https://git.openjdk.org/jdk/pull/25386#pullrequestreview-3533246147 PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2583686701 PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2583690118 PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2583697429 From epeter at openjdk.org Wed Dec 3 05:46:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Dec 2025 05:46:06 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v4] In-Reply-To: <9mnRXpB16Y6Mw0TSGFJz-69m24lzCNPMC_B1_YseD4M=.be94bbba-88ce-4958-a8bd-89862d7ec2e7@github.com> References: <9mnRXpB16Y6Mw0TSGFJz-69m24lzCNPMC_B1_YseD4M=.be94bbba-88ce-4958-a8bd-89862d7ec2e7@github.com> Message-ID: <9HDQAMPQo9VBlnXt7WpTjK51AcNHwOfHxc4t9YyBCxc=.818e962a-8898-41a1-920d-58444d70961b@github.com> On Wed, 3 Dec 2025 05:36:52 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: >> >> - more >> - review >> - Merge branch 'master' into JDK-8351889 >> - exp >> - Merge branch 'master' into JDK-8351889 >> - verif >> - Merge branch 'master' into JDK-8351889 >> - test seed >> - more >> - Merge branch 'master' into JDK-8351889 >> - ... and 4 more: https://git.openjdk.org/jdk/compare/d6d17aab...15c17bb1 > > src/hotspot/share/opto/phaseX.cpp line 2076: > >> 2074: if (addp->in(AddPNode::Base) == n->in(AddPNode::Base)) { >> 2075: return false; >> 2076: } > > Suggestion: > > if (!addp->is_AddP() || > addp->in(AddPNode::Base)->is_top() || > addp->in(AddPNode::Base) == n->in(AddPNode::Base)) { > return false; > } It could be a bit compacted. How do you imagine `verify_node_invariants_for` will grow over time? I suspect it will become a laundry-list of invariants, we continue going down through it as long as no invariant is violated. For that, it may make more sense to invert your condition, and assert/print inside the if-block. It would make the code more extendable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2583696057 From epeter at openjdk.org Wed Dec 3 05:52:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Dec 2025 05:52:02 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v8] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Mon, 1 Dec 2025 13:39:09 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Review comments resolutions @jatin-bhateja Testing passed, fix looks good to me. Thanks for working on this ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28533#pullrequestreview-3533284691 From jbhateja at openjdk.org Wed Dec 3 06:26:58 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 3 Dec 2025 06:26:58 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v8] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: <2jl8sF9HfU5rPWTDi_jcl4vz6PjIxcdAU4bBwR1sb6c=.1f40d642-5319-4b0c-9505-9bed1a17aecd@github.com> On Tue, 2 Dec 2025 13:32:18 GMT, Emanuel Peter wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Review comments resolutions > > Testing submitted! Code looks good to me :) Thanks @eme64 , integrating now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28533#issuecomment-3605288681 From epeter at openjdk.org Wed Dec 3 06:32:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Dec 2025 06:32:57 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v8] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Mon, 1 Dec 2025 13:39:09 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Review comments resolutions Oh, a second review would be required though! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28533#issuecomment-3605301206 From fyang at openjdk.org Wed Dec 3 07:06:05 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 3 Dec 2025 07:06:05 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> Message-ID: On Mon, 1 Dec 2025 15:13:13 GMT, Hamlin Li wrote: >> Hi, >> >> This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. >> >> This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. >> >> Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> Column names meanings: >> * p: with patch >> * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> * m: without patch >> * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> >> #### Average improvement >> >> NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. >> >> For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) >> -- | -- | -- | -- >> 1.022782609 | 2.198717391 | 2.162673913 | 2.199 >> >> > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - remove log_warning > - add test cases: BoolTest::ge/gt in enc_cmove_fp_cmp_fp Latest version seems fine to me. Thanks for the update. As we are very close to JDK 26 rampdown (2025/12/04), I suggest we postpone this to JDK 27. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28309#pullrequestreview-3533498917 From mhaessig at openjdk.org Wed Dec 3 07:17:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 3 Dec 2025 07:17:56 GMT Subject: RFR: 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:44:24 GMT, Aleksey Shipilev wrote: > I just noticed (while looking at [JDK-8372800](https://bugs.openjdk.org/browse/JDK-8372800)) that I made a little error in [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) refactor, which made GetAndSet-acquire instruction cost twice as high. The usual cost for acquire versions are twice as low, likely to be selected instead of non-acquire versions. > > This bug happened as I "simplified" stencils at some point by dropping some arguments and renumbering the remaining ones. This is one place where I apparently forgot to renumber one usage. See other checks for `ifelse($3,Acq,...` in that stencil, all of them are `$3` (correct), not `$4` (incorrect). Seen no real bugs because of this mishap, but it would be good to fix it in case we see issues later. I also looked at stencils again, and I think there are no other argument-index problems like this anywhere else. > > The real change is in `aarch64_atomic_ad.m4`, `.ad` is re-generated from that stencil. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, quick jcstress run Thank you for fixing this, @shipilev. This looks good to me. ------------- Marked as reviewed by mhaessig (Committer). PR Review: https://git.openjdk.org/jdk/pull/28598#pullrequestreview-3533532583 From thartmann at openjdk.org Wed Dec 3 07:24:06 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 3 Dec 2025 07:24:06 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v7] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 09:13:38 GMT, Roland Westrelin wrote: >> In test cases, `mh` is initially not constant so the method handle >> invoke can't be inlined. It is later found to be constant, so it can >> be turned into a direct call by >> `Compile::process_late_inline_calls_no_inline()`. In the meantime, the >> `CallNode` for the mh invoke is cloned (by loop switching). In the >> process, only a shallow copy of the `JVMState` for the call is >> made. The initial `CallNode` is the first to be processed by >> `Compile::process_late_inline_calls_no_inline()` and that causes that >> `CallNode` to become dead. The cloned `CallNode` is then >> processed. The `JVMState` for that one references the initial >> `CallNode` in its caller's `JVMState`. Because that node is dead, that >> causes a crash. The fix I propose is to make a deep copy of the >> `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is >> assigned to the node. >> >> The other failure I see with these tests is: >> >> >> # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 >> # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! >> >> >> because even though the `CallNode` is cloned, there's still only one >> late inline recorded. The fix here is to increment >> `_number_of_mh_late_inlines` when the node is cloned. >> >> This was reported by the netty developers. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8370939 > - Merge branch 'master' into JDK-8370939 > - review > - Merge branch 'master' into JDK-8370939 > - review > - more > - more > - more > - more > - test > - ... and 1 more: https://git.openjdk.org/jdk/compare/27065cb8...64b11e6e All testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28088#issuecomment-3605438298 From epeter at openjdk.org Wed Dec 3 08:01:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Dec 2025 08:01:06 GMT Subject: RFR: 8357551: RISC-V: support CMoveF/D vectorization [v7] In-Reply-To: <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> References: <0errm4F59Sa9JdJZKdAGBnt9cF1DKkUUv1XmUtMmHI8=.ab9c0d54-799c-4385-b96c-d7c698ffe965@github.com> <4-PqNRUxM-80k4mQdYNzc0HrirtkTCjfVAzgRewW08M=.d2fe4512-16cd-4abf-8a7f-e91341c37110@github.com> Message-ID: <4nP7XPYsi87jqsutMCdufFx4Jz6aa-X_pPpjd_uGoG0=.8ce4119b-c3bb-4cc3-b714-ccbeb9ac7f42@github.com> On Mon, 1 Dec 2025 15:13:13 GMT, Hamlin Li wrote: >> Hi, >> >> This pr add CMoveF/D on riscv, which enable vectorization of statement like: `op_1 bop op_2 ? res_f_d_1 : res_f_d_2 in a loop`. >> >> This pr is also a preparation for further vectorization in https://github.com/openjdk/jdk/pull/28231. >> >> Previously it's https://github.com/openjdk/jdk/pull/25341, but at that time, C2 SLP has some issue with unsigned comparison, which is now fixed, so it's good to continue the work. >> >> # Test >> ## Jtreg >> >> in progress... >> >> ## Performance >> >> Column names meanings: >> * p: with patch >> * p+v: with patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> * m: without patch >> * m+v: without patch, `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on >> >> #### Average improvement >> >> NOTE: With only this PR, it brings performance benefit in case of `CMoveF+CmpF`, `CMoveD+ComD`, `CMoveF+CmpI`, `CMoveD+CmpL`. The data below is based on fullly implmenting the vectorization of `CMoveI/L/F/D+CmpI/L/F/D`, which will be achieved by https://github.com/openjdk/jdk/pull/28231. >> >> For details, check the performance data in https://github.com/openjdk/jdk/pull/25341 on riscv. >> >> Opt (m/p) | Opt (m+v/p+v) | Opt (p/p+v) | Opt (m/p+v) >> -- | -- | -- | -- >> 1.022782609 | 2.198717391 | 2.162673913 | 2.199 >> >> > > Hamlin Li has updated the pull request incrementally with two additional commits since the last revision: > > - remove log_warning > - add test cases: BoolTest::ge/gt in enc_cmove_fp_cmp_fp I can help review this as well, but currently there is a lot going on with JDK26 bugs. Hopefully things settle down in a few weeks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28309#issuecomment-3605543510 From mhaessig at openjdk.org Wed Dec 3 08:09:05 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 3 Dec 2025 08:09:05 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v5] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 13:32:22 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into fix-c2-segfault-unlocknode > - address comments > - fix rename > - rename test file > - Merge branch 'master' into fix-c2-segfault-unlocknode > - fix test spacing > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - copyright format fix? > - 8370502: C2: segfault while adding node to IGVN worklist Testing passed up to tier3 on linux-x64-debug, linux-aarch64-debug, macosx-x64-debug, macosx-aarch64-debug, and windows-x64-debug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3605568889 From thartmann at openjdk.org Wed Dec 3 08:53:33 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 3 Dec 2025 08:53:33 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory [v2] In-Reply-To: References: <3SFe0aKR8DW5SKjr375S78OWgJS7g2pLZfepb43yISI=.958eda85-ca1a-4f85-a9a2-c7ad60dcc025@github.com> Message-ID: On Tue, 2 Dec 2025 15:28:05 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/stubDeclarations_x86.hpp line 76: >> >>> 74: do_arch_entry, \ >>> 75: do_arch_entry_init) \ >>> 76: do_arch_blob(compiler, 120000 WINDOWS_ONLY(+2000)) \ >> >> I was wondering if there are any reason for this value (apart that it is enough for the test to pass. I just noticed that it has been increased already in the past). > > The assert was suggesting 119k (and change..) so I rounded slightly up. I was going to ask (i.e. @TobiHartmann ?) if thats enough.. > > (Similarly, I am concerned that I am contributing to a larger JVM footprint, with my changes.. but I suppose 11k is comparatively insignificant in the grand scheme of things...) > > Thanks for the review! I think that's a reasonable increase. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28588#discussion_r2584174950 From mli at openjdk.org Wed Dec 3 09:23:01 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Dec 2025 09:23:01 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 03:20:34 GMT, Anjian Wen wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2636: >> >>> 2634: void counterMode_AESCrypt(int round, Register in, Register out, Register key, Register counter, >>> 2635: Register input_len, Register saved_encrypted_ctr, Register used_ptr) { >>> 2636: // Algorithm: >> >> This should be my last comment :) >> Where is this "Algorithm" from? Can you put a link here? > > Oh sure, when implementing the Algorithm, I mainly referred to the Java code implementation (https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java#L200-L212). besides, I referred to the aarch64 implementation (https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L3190), and made some modifications for RISC-V instructions Thanks! If this C style code is based on the java one, can you add a reference here to the java code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584282181 From wenanjian at openjdk.org Wed Dec 3 09:51:07 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 3 Dec 2025 09:51:07 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 09:20:20 GMT, Hamlin Li wrote: >> Oh sure, when implementing the Algorithm, I mainly referred to the Java code implementation (https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java#L200-L212). besides, I referred to the aarch64 implementation (https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L3190), and made some modifications for RISC-V instructions > > Thanks! > If this C style code is based on the java one, can you add a reference here to the java code? It's for future reference. Do you mean adding a comment like ?mainly according to com.sun.crypto.provider.CounterMode::implCrypt? here? I may not have described it clearly, the java one(https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java#L200-L212) I referred to is the function for which we try to implement its intrinsics, do we still need add a reference? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584381894 From mli at openjdk.org Wed Dec 3 09:56:18 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Dec 2025 09:56:18 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 09:48:47 GMT, Anjian Wen wrote: >> Thanks! >> If this C style code is based on the java one, can you add a reference here to the java code? It's for future reference. > > Do you mean adding a comment like ?mainly according to com.sun.crypto.provider.CounterMode::implCrypt? here? > > I may not have described it clearly, the java one(https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java#L200-L212) I referred to is the function for which we try to implement its intrinsics, do we still need add a reference? I mean, where is the C code from? we'd better put a reference here to point to it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584400100 From mli at openjdk.org Wed Dec 3 09:59:13 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Dec 2025 09:59:13 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 09:53:20 GMT, Hamlin Li wrote: >> Do you mean adding a comment like ?mainly according to com.sun.crypto.provider.CounterMode::implCrypt? here? >> >> I may not have described it clearly, the java one(https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java#L200-L212) I referred to is the function for which we try to implement its intrinsics, do we still need add a reference? > > I mean, where is the C code from? we'd better put a reference here to point to it. I assume your assembly code is kind of translation from a high language code, or maybe I misunderstood it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584412804 From wenanjian at openjdk.org Wed Dec 3 10:07:33 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 3 Dec 2025 10:07:33 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 09:56:45 GMT, Hamlin Li wrote: >> I mean, where is the C code from? we'd better put a reference here to point to it. > > I assume your assembly code is kind of translation from a high language code, or maybe I misunderstood it? Oh?it is a pseudo code I created for easily understand ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584443250 From mli at openjdk.org Wed Dec 3 10:13:43 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Dec 2025 10:13:43 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 10:04:15 GMT, Anjian Wen wrote: >> I assume your assembly code is kind of translation from a high language code, or maybe I misunderstood it? > > Oh?it is a pseudo code I created for easily understand is riscv assembly a migration from aarch64? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584466181 From qamai at openjdk.org Wed Dec 3 10:24:50 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 3 Dec 2025 10:24:50 GMT Subject: RFR: 8350208: CTW: GraphKit::add_safepoint_edges asserts "not enough operands for reexecution" In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 03:36:13 GMT, Dean Long wrote: >> Hi, >> >> This PR fixes the issue of the compiler crashing with "not enough operands for reexecution". The issue here is that during `Parse::catch_inline_exceptions`, the old stack is gone, and we cannot reexecute the current bytecode anymore. However, there are some places where we try to insert safepoints into the graph, such as if the handler is a backward jump, or if one of the exceptions in the handlers is not loaded. Since the `_reexecute` state of the current jvms is "undefined", it is inferred automatically that it should reexecute for some bytecodes such as `putfield`. The solution then is to explicitly set `_reexecute` to false. >> >> I can manage to write a unit test for the case of a backward handler, for the other cases, since the exceptions that can be thrown for a bytecode that is inferred to reexecute are `NullPointerException`, `ArrayIndexOutOfBoundsException`, and `ArrayStoreException`. I find it hard to construct such a test in which one of them is not loaded. >> >> Please kindly review, thanks a lot. > > src/hotspot/share/opto/doCall.cpp line 958: > >> 956: ex_node = use_exception_state(ex_map); >> 957: // The stack from before the throwing bytecode is gone, cannot reexecute here >> 958: jvms()->set_should_reexecute(false); > > I agree there are situations where we need to set the reexecute flag explicitly and not base it on the bytecode. I recently fixed JDK-8370766 and filed JDK-8372846 as a followup for similar issues. I need to try out your test to understand this better. Does it cause a backwards-branch safepoint? I suspect that it may not be safe to set rexeecute to false here. If reexecute is false and -XX:+VerifyStack is set, deoptimization may fail if the operands are not on the stack. Yes, it is a backwards-branch safepoint. Tbh, after looking deeper, I don't really understand what is happening here. I modified the test a little bit so the final compiled code does not elide the safepoint in the loop, and ran with `-XX:+VerifyStack -XX:+DeoptimizeALot -XX:+SafepointALot`, but the test still passed after 100 repeats. I think that the state is correct, but I don't see how the compiled code notifies the deoptimizater and the interpreter that it is in an exception state, and the interpreter needs to find an exception handler instead of continuing with the next bytecode. My guess is that the compiled code should store the exception into `Thread::_pending_exception`, or the deoptimizer needs to do so, and the interpreter needs to check that when being handed the control. But I have not yet found that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28597#discussion_r2584505420 From rrich at openjdk.org Wed Dec 3 10:29:24 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 3 Dec 2025 10:29:24 GMT Subject: RFR: 8370473: C2: Better Aligment of Vector Spill Slots [v4] In-Reply-To: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> References: <2dAfr3bnYwrmrMwlhDNniaYVQYOrR2ARztDEB4qqBzY=.aaa1b90d-0aa7-4d42-a3eb-c52a6b04cbaf@github.com> Message-ID: On Thu, 20 Nov 2025 10:21:34 GMT, Richard Reingruber wrote: >> With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. >> >> It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. >> >> The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. >> >> The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. >> Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. >> >> So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) >> >> There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. >> >> Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. >> >> ##### Testing with fastdebug builds on AARCH64 and PPC64: >> >> hotspot_vector_1 >> hotspot_vector_2 >> jdk_vector >> jdk_vector_sanity >> >> ##### The change passed our CI testing: >> Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. >> Testing was done on the main platforms and also on Linux/PPC64le and AIX. >> >> C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: >> >> compiler/vectorapi/VectorRearrangeTest.java >> jdk/incubator/vector/Byte128VectorLoadStoreTests.java >> jdk/incubator/vector/Double256VectorLoadStoreTests.java >> jdk/incubator/vector/Float128VectorTests.java >> jdk/incubator/vector/Long256VectorLoadStoreTests.java >> jdk/incubator/vector/Short128VectorLoadStoreTests.java >> jdk/incubator/vector/Vector64ConversionTests.java > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' > - Exclude IR check on riscv with rvv > - Enhance comment > - Fix OptoAssembly for Power 8 > - PPC: OptoAssembly for vector spilling > - Assert aligned sp offsets in vector spilling > - Delete TMP and !UseNewCode > - Align Matcher::_new_SP for better vector spilling > - TMP: trace unaligned vector spilling > - Add test Thanks again for the feedback. ------------- PR Comment: https://git.openjdk.org/jdk/pull/27969#issuecomment-3606133505 From wenanjian at openjdk.org Wed Dec 3 10:29:25 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 3 Dec 2025 10:29:25 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 10:10:50 GMT, Hamlin Li wrote: >> Oh?it is a pseudo code I created for easily understand > > is riscv assembly a migration from aarch64? I mainly follow the AES standard algorithm, but I did refer to the implementation of AArch64. I'm not sure if it can be described with migration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584518540 From rrich at openjdk.org Wed Dec 3 10:32:25 2025 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 3 Dec 2025 10:32:25 GMT Subject: Integrated: 8370473: C2: Better Aligment of Vector Spill Slots In-Reply-To: References: Message-ID: <7kWVlFj8b6kCAGo2YRKoW39R36PYm3pb88zCPVFJM9o=.682a51a9-f924-45d0-b0a8-ba4f8df16b92@github.com> On Fri, 24 Oct 2025 07:36:57 GMT, Richard Reingruber wrote: > With this change c2 will allocate spill slots for vectors with sp offsets aligned to the size of the vectors. Maximum alignment is StackAlignmentInBytes. > > It also updates comments that have never been changed to describe how register allocation works for sizes larger than 64 bit. > > The change helps to produce better spill code on AARCH64 and PPC64 where an additional add instruction is emitted if the offset of a vector un-/spill is not aligned. > > The change is rather a cleanup than an optimization. In most cases the sp offsets will already be properly aligned. > Only with incoming stack arguments unaligned offsets can be generated. But also then alignment padding is only added if vector registers larger than 64 bit are used. > > So the costs are effectively zero. Especially because extra padding won't enlarge the frame since only virtual registers are allocated which are mapped to the caller frame (see `pad0` in the [diagram](https://github.com/openjdk/jdk/blob/92e380c59c2498b1bc94e26658b07b383deae59a/src/hotspot/cpu/aarch64/aarch64.ad#L3829)) > > There's a risk though that with the extra virtual registers allocated for `pad0` the limit of registers a `RegMask` can represent is reached (occurs with excessive spilling). If this happens the compilation would fail. It could be retried with smaller alignment for vector spilling though. I havn't implemented it as I thought the risk is negligible. > > Note that the sp offset of the accesses should be aligned rather than the effective address. So it could even be argued that the maximum alignment could be higher than StackAlignmentInBytes. > > ##### Testing with fastdebug builds on AARCH64 and PPC64: > > hotspot_vector_1 > hotspot_vector_2 > jdk_vector > jdk_vector_sanity > > ##### The change passed our CI testing: > Tier 1-4 of hotspot and jdk. All of langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. > > C2 compilation of `jdk.internal.vm.vector.VectorSupport::rearrangeOp` has unaligned spill offsets. It is covered by the following tests: > > compiler/vectorapi/VectorRearrangeTest.java > jdk/incubator/vector/Byte128VectorLoadStoreTests.java > jdk/incubator/vector/Double256VectorLoadStoreTests.java > jdk/incubator/vector/Float128VectorTests.java > jdk/incubator/vector/Long256VectorLoadStoreTests.java > jdk/incubator/vector/Short128VectorLoadStoreTests.java > jdk/incubator/vector/Vector64ConversionTests.java This pull request has now been integrated. Changeset: 804ce0a2 Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/804ce0a2394cb3f837441976e5ef6eb4b9cab257 Stats: 203 lines in 7 files changed: 157 ins; 29 del; 17 mod 8370473: C2: Better Aligment of Vector Spill Slots Reviewed-by: goetz, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/27969 From thartmann at openjdk.org Wed Dec 3 10:33:11 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 3 Dec 2025 10:33:11 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 15:40:51 GMT, Volodymyr Paprotski wrote: >> Requires a Broadwell machine, but was able to reproduce with an emulator: >> >> >> ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi >> ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Manuel Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28588#pullrequestreview-3534338808 From pminborg at openjdk.org Wed Dec 3 10:34:41 2025 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 3 Dec 2025 10:34:41 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: Message-ID: <4gKSL8hFAE2qSuTmhJa6JMfoB6JfUnK9fzwHAnH2Zzg=.9fc69461-bbe7-4242-b3b1-b4b004f35ce0@github.com> On Tue, 2 Dec 2025 17:24:41 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2036: >> >>> 2034: var constant = MethodHandleImpl.isCompileConstant(vh); >>> 2035: var cache = adaptedMh; >>> 2036: if (constant == MethodHandleImpl.CONSTANT_YES && cache != null) { >> >> Rookie question: Is there multi-thread considerations here? How about visibility across threads? > > MethodHandle is immutable and can be safely published. So this is ok. I meant that even though objects are immutable, plain semantics might not always do. Reference: https://shipilev.net/blog/2014/safe-public-construction/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2584535309 From qamai at openjdk.org Wed Dec 3 10:45:39 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 3 Dec 2025 10:45:39 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> References: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> Message-ID: On Mon, 1 Dec 2025 19:30:22 GMT, Vladimir Ivanov wrote: >> Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. >> >> There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. >> >> The difference can be illustrated with the following simple cases: >> >> class A { void m() {} } >> class B extends A { void m() {} } >> >> void testInstanceOf(A obj) { >> if (obj instanceof B) { >> obj.m(); >> } >> } >> >> InstanceOf::testInstanceOf (12 bytes) >> @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call >> >> vs >> >> void testInstanceOfCast(A obj) { >> if (obj instanceof B) { >> B b = (B)obj; >> b.m(); >> } >> } >> >> InstanceOf::testInstanceOfCast (17 bytes) >> @ 13 InstanceOf$B::m (1 bytes) inline (hot) >> >> >> Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. >> >> FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. >> >> Testing: hs-tier1 - hs-tier5 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Test fix Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28517#pullrequestreview-3534403006 From qamai at openjdk.org Wed Dec 3 10:45:41 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 3 Dec 2025 10:45:41 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: References: Message-ID: <4abJMXdHzqKGqU58EXHaXO7849B0a64NoShEvU110I4=.87a93e5c-b73e-4c6e-b85b-8797eea8814d@github.com> On Mon, 1 Dec 2025 19:49:29 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/opto/parse2.cpp line 1739: >> >>> 1737: } >>> 1738: >>> 1739: // Match an instanceof check. >> >> We seem to require that the input of `SubTypeCheck` is not `null`. What do you think about allowing `SubTypeCheck` to accept `null` and return `false`? > > Yes, it's a good idea and the right direction to move. While experimenting with a different enhancement, I noticed that a subtype check leaves a null check behind irrespective of whether the check goes away or not. > > Unfortunately, there are some engineering considerations which complicates the change. `SubTypeCheck` is shared across all the places where subtype checks are performed, but `checkcast` and `instanceof` differ in the way `null` is handled. So, the proper way to fix it is to introduce a higher-level representation which implicitly handles nulls and then eventually lower it to `SubTypeCheck` and materialize null check if needed. There are multiple ways without having to have yet another higher-level representation. The first one is that since `SubTypeCheck` does not accept `null` now, we can just choose one result for `null`. Choosing the `instanceof` approach may be a little more desirable, as it removes the need to perform this complicated match, and for `checkcast` we can manually insert a `CheckCastPP` anyway. Another solution is to have another input to `SubTypeCheck` which gives the result when the `obj` is `null`. On a whim, I kind of like this, as we can match both the `checkcast` and the `instanceof` pattern here, it also simplifies `GraphKit::gen_checkcast`, as we do not have to worry about "the cast that always succeeds will leave behind a null check". Just a suggestion, though. This PR is fine as it is to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2584574037 From shade at openjdk.org Wed Dec 3 10:58:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Dec 2025 10:58:01 GMT Subject: Integrated: 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 In-Reply-To: References: Message-ID: <82avOvCMGtKCw3qqonnTCa3r3G0wT_qEv7f-nuYQn38=.0602395f-f22f-4b07-8bf7-e1cec5b0cd1a@github.com> On Tue, 2 Dec 2025 10:44:24 GMT, Aleksey Shipilev wrote: > I just noticed (while looking at [JDK-8372800](https://bugs.openjdk.org/browse/JDK-8372800)) that I made a little error in [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) refactor, which made GetAndSet-acquire instruction cost twice as high. The usual cost for acquire versions are twice as low, likely to be selected instead of non-acquire versions. > > This bug happened as I "simplified" stencils at some point by dropping some arguments and renumbering the remaining ones. This is one place where I apparently forgot to renumber one usage. See other checks for `ifelse($3,Acq,...` in that stencil, all of them are `$3` (correct), not `$4` (incorrect). Seen no real bugs because of this mishap, but it would be good to fix it in case we see issues later. I also looked at stencils again, and I think there are no other argument-index problems like this anywhere else. > > The real change is in `aarch64_atomic_ad.m4`, `.ad` is re-generated from that stencil. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, quick jcstress run This pull request has now been integrated. Changeset: 3f447edf Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/3f447edf0e22431628ebb74212f760209ea29d37 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 Reviewed-by: dlong, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/28598 From shade at openjdk.org Wed Dec 3 10:58:00 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Dec 2025 10:58:00 GMT Subject: RFR: 8372862: AArch64: Fix GetAndSet-acquire costs after JDK-8372188 In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:44:24 GMT, Aleksey Shipilev wrote: > I just noticed (while looking at [JDK-8372800](https://bugs.openjdk.org/browse/JDK-8372800)) that I made a little error in [JDK-8372188](https://bugs.openjdk.org/browse/JDK-8372188) refactor, which made GetAndSet-acquire instruction cost twice as high. The usual cost for acquire versions are twice as low, likely to be selected instead of non-acquire versions. > > This bug happened as I "simplified" stencils at some point by dropping some arguments and renumbering the remaining ones. This is one place where I apparently forgot to renumber one usage. See other checks for `ifelse($3,Acq,...` in that stencil, all of them are `$3` (correct), not `$4` (incorrect). Seen no real bugs because of this mishap, but it would be good to fix it in case we see issues later. I also looked at stencils again, and I think there are no other argument-index problems like this anywhere else. > > The real change is in `aarch64_atomic_ad.m4`, `.ad` is re-generated from that stencil. > > Additional testing: > - [x] Linux AArch64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, quick jcstress run Thanks for reviews! I am integrating this now to get it in cleanly before RDP1 cutoff :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/28598#issuecomment-3606263998 From wenanjian at openjdk.org Wed Dec 3 11:01:58 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Wed, 3 Dec 2025 11:01:58 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 10:26:16 GMT, Anjian Wen wrote: >> is riscv assembly a migration from aarch64? > > I mainly follow the AES standard algorithm, but I did refer to the implementation of AArch64. I'm not sure if it can be described with migration. In addition, there is a difference compared with aarch64. the Algorithm in aarch64 has an extra large block optimization branch. it calculate 4 blocks in one loop, which seems to make the code more cache friendly, but add more control flow and use more vector register. I think maybe we can do this kind of optimization when we can test on a real machine later ? I just support the standard algorithm currently. > is riscv assembly a migration from aarch64? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2584647886 From shade at openjdk.org Wed Dec 3 11:15:44 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Dec 2025 11:15:44 GMT Subject: RFR: 8351844: C2 x64 AVX2 vpminmax assertion failure with equivalent inputs In-Reply-To: References: Message-ID: <4jqjXLkV2LwlS1HRlb2fFIJhO-jU6C2_yVWyiB9z2ZI=.208e3745-23b8-466e-9ea9-df42e49119a4@github.com> On Tue, 2 Dec 2025 11:10:08 GMT, Jatin Bhateja wrote: > Bug fix PR fixes an incorrect register equivalence in macro assembler. MaxV/MinV IR with equivalent inputs should ideally be removed from ideal graph before reaching to macro assembler. [JDK-8372797](https://bugs.openjdk.org/browse/JDK-8372797) is filed to add relevant identity transformations. > > Best Regards, > Jatin The product fix looks reasonable. The asserts need to verify that write to `dst` does not yet destroy either of `src`-s. We don't need to check that `src`-s are actually distinct. I have test comments/questions: test/hotspot/jtreg/compiler/vectorapi/TestVectorMinMaxSameInputs.java line 44: > 42: > 43: public static void main(String[] args) { > 44: TestFramework.runWithFlags("--add-modules=jdk.incubator.vector", "-ea", "-XX:+IgnoreUnrecognizedVMOptions", "-XX:UseAVX=2"); I understand `-XX:UseAVX=2` is here to hit the path where the assert is on. But for a generic test like this, it would seem unwise to limit the test configuration only to AVX=2. I would expect we instead run the tests with `TEST_VM_OPTS=-XX:UseAVX=2` to confirm they work with AVX=2 even on AVX-512 machines. test/hotspot/jtreg/compiler/vectorapi/TestVectorMinMaxSameInputs.java line 58: > 56: > 57: @Test > 58: @IR(counts={IRNode.MAX_VL, "1"}) In other tests, I see we are actually checking for CPU feature flags before assuming these nodes are present: @Test @IR(applyIfCPUFeatureOr = { "sse4.1", "true" , "asimd" , "true", "rvv", "true"}, counts = { IRNode.MAX_VL, "> 0" }) So this test would probably fail on some older hardware and/or with some configuration options? ------------- PR Review: https://git.openjdk.org/jdk/pull/28600#pullrequestreview-3534540911 PR Review Comment: https://git.openjdk.org/jdk/pull/28600#discussion_r2584692296 PR Review Comment: https://git.openjdk.org/jdk/pull/28600#discussion_r2584688358 From duke at openjdk.org Wed Dec 3 11:15:52 2025 From: duke at openjdk.org (duke) Date: Wed, 3 Dec 2025 11:15:52 GMT Subject: RFR: 8371792: Refactor barrier loop tests out of TestIfMinMax [v4] In-Reply-To: <5P58y7mFExd-rdT_nGu_Ky0UG-vDGPRG2IycLX6xwIY=.403c2f90-1ab3-4096-80a7-b80d819d3ca9@github.com> References: <5P58y7mFExd-rdT_nGu_Ky0UG-vDGPRG2IycLX6xwIY=.403c2f90-1ab3-4096-80a7-b80d819d3ca9@github.com> Message-ID: On Fri, 28 Nov 2025 09:40:25 GMT, Galder Zamarre?o wrote: >> Trivial cleanup to move tests out of a test class whose description does not match these tests > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/gcbarriers/TestMinMaxLongLoopBarrier.java > > Co-authored-by: Emanuel Peter @galderz Your change (at version d023353faf7220920ea1434756d822361ebe4032) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28385#issuecomment-3606348207 From krk at openjdk.org Wed Dec 3 12:25:15 2025 From: krk at openjdk.org (Kerem Kat) Date: Wed, 3 Dec 2025 12:25:15 GMT Subject: RFR: 8370502: C2: segfault while adding node to IGVN worklist [v5] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 13:32:22 GMT, Kerem Kat wrote: >> Do not try to replace `fallthrough_memproj` when it is null, fixes crash. >> >> Test case is simplified from the ticket. Verified that the case crashes without the fix. > > Kerem Kat has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into fix-c2-segfault-unlocknode > - address comments > - fix rename > - rename test file > - Merge branch 'master' into fix-c2-segfault-unlocknode > - fix test spacing > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - Update src/hotspot/share/opto/macro.cpp > > Co-authored-by: Manuel H?ssig > - copyright format fix? > - 8370502: C2: segfault while adding node to IGVN worklist Thanks! Cut these issues for tracking: [JDK-8373011](https://bugs.openjdk.org/browse/JDK-8373011) and [JDK-8373012](https://bugs.openjdk.org/browse/JDK-8373012). The latter also covers "it would be good to know if expand_lock_node() also needs a null check". ------------- PR Comment: https://git.openjdk.org/jdk/pull/28432#issuecomment-3606609512 From galder at openjdk.org Wed Dec 3 12:34:22 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 3 Dec 2025 12:34:22 GMT Subject: Integrated: 8371792: Refactor barrier loop tests out of TestIfMinMax In-Reply-To: References: Message-ID: On Wed, 19 Nov 2025 08:30:56 GMT, Galder Zamarre?o wrote: > Trivial cleanup to move tests out of a test class whose description does not match these tests This pull request has now been integrated. Changeset: a655ea48 Author: Galder Zamarre?o Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/a655ea48453a321fb7cadc6ffb6111276497a929 Stats: 123 lines in 2 files changed: 86 ins; 36 del; 1 mod 8371792: Refactor barrier loop tests out of TestIfMinMax Reviewed-by: chagedorn, epeter, bmaillard ------------- PR: https://git.openjdk.org/jdk/pull/28385 From epeter at openjdk.org Wed Dec 3 13:03:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Dec 2025 13:03:02 GMT Subject: RFR: 8367158: C2: create better fill and copy benchmarks, taking alignment into account [v2] In-Reply-To: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> References: <3PEmRtpnMH0sRwWGK0uWkItDuytAS-ErVfqYK5X7rDQ=.2d484c9a-c25a-4a60-a856-fcbd4e614914@github.com> Message-ID: > **Summary** > > I created some `fill` and `copy` style benchmarks, covering both `arrays` and `MemorySegment`s. > Reasons for this benchmark: > - I want to compare auto-vectorization with intrinsics (array assembly style intrinsics, and MemorySegment java level special implementations). This allows us to see if some are slower than others, and if we can manage to improve the slower versions somehow in the future. > - There are some known issues we can demonstrate well with this benchmark: > - Super-Unrolling: unrolling the vectoirzed loop gets us extra performance, but the exact factor may not be optimal yet for auto-vectorization. > - Small iteration count loops: auto-vectorization can lead to slowdowns. > - Many benchmarks do not control for alignment. But that creates noise. I just go over all possible alignments, that should smooth out the noise. > - Most benchmarks do not control for 4k aliasing (x86 effect in store buffer). I make sure that load/stores are not a multiple of 4k bytes apart, so we can avoid the noise of that effect. > > ---------------------------------------------------------------------- > > **Analysis based on this Benchmark** > > Analysis done in this PR: > - Arrays: auto vectorization vs scalar loops performance > - Arrays: auto vectorization loops vs intrinsics > - MemorySegments: auto vectorization loops vs scalar loops vs `MemorySegment.fill/copy` > > Future work: > - Investigate deeper, inspect assembly, etc. > - Impact of `-XX:SuperWordAutomaticAlignment=0` on small iteration count loops. > - Investigate effect of `-XX:-OptimizeFill`. It seems that the loops in this benchmark are not detected automatically, and so the array intrinsics are not used. Why? > - Investigate impact of `CompactObjectHeaders`. Does enabling/disabling change any performance? > - Investigate if adjusting the super-unrolling factor could improve performance for auto-vectorization: [JDK-8368061](https://bugs.openjdk.org/browse/JDK-8368061) > - Performance comparison with Graal. > > ---------------------------------------------------------------------- > > **Array Benchmark: auto vectorization vs scalar** > > We can see that for arrays, auto vectorization leads to minor regressions for sizes 1-32, and then generally auto vectorization is faster for larger sizes. And this is true for both `fill` and `copy`. > > Strange: `macosx_aarch64` with `copy_int`. The auto vectoirized performance has a sudden drop around 150 iterations. Also for `fill_long` we have a "phase-transition" around 64, that goes steeper rather... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - small modulo fix from review suggestion - Merge branch 'master' into JDK-8367158-fill-and-copy-benchmarks - more MS types - fix MS fill - more backing types - object array benchmarks - fix bm - ms bm update - clean up benchmark - more types - ... and 6 more: https://git.openjdk.org/jdk/compare/098a7d6e...80378aea ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27315/files - new: https://git.openjdk.org/jdk/pull/27315/files/40a80d79..80378aea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27315&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27315&range=00-01 Stats: 346063 lines in 3520 files changed: 221363 ins; 78352 del; 46348 mod Patch: https://git.openjdk.org/jdk/pull/27315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27315/head:pull/27315 PR: https://git.openjdk.org/jdk/pull/27315 From rsunderbabu at openjdk.org Wed Dec 3 13:19:50 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Wed, 3 Dec 2025 13:19:50 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability Message-ID: Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. MD5 SHA1 SHA256 SHA3 Testing: All flag combinations from CI hotspot tiers 1 to 5 PS: only for tier testings, mac-aarch was skipped due to resource constraints ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/28634/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28634&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372941 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28634/head:pull/28634 PR: https://git.openjdk.org/jdk/pull/28634 From jvernee at openjdk.org Wed Dec 3 13:26:28 2025 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 3 Dec 2025 13:26:28 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0r bfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> <5CADH75ZjadKttOKwsykRFUPlQKLiwCW8E5WkM_75a4=.fd992c8f-e8bc-4775-9ea3-d5212664e3df@github.com> Message-ID: <5QPAetQEkrBgFKtMt0i9Ku_4s2GCirMl2uqLH3j8x7g=.e5fc8964-0080-45f7-9005-31922ec06ba1@github.com> On Wed, 3 Dec 2025 04:10:05 GMT, Chen Liang wrote: >> Sorry, I still don't understand how it is intended to work. Why does `MethodHandleImpl.isCompileConstant(vh) == true` imply that the cached value is compatible with the constant `vh`? >> >> >> // Keep capturing - vh may suddenly get promoted to a constant by C2 >> >> >> Capturing happens outside compiler thread. It is not affected by C2 (except when it completely prunes the whole block). >> >> So, either any captured adaptation is valid/compatible or there's a concurrency issue when C2 kicks in and there's a concurrent cache update happening with incompatible version. > >> any captured adaptation is valid/compatible > > Yes, if `vh` is a constant, any captured adaptation from `vh.getMethodHandle(mode).asType(symbolicMethodTypeInvoker)` is valid/compatible. > > For thread safety, MethodHandle supports safe publication, so I think we are fine publishing this way. Looking at this, I'm not sure we can assume that we only see one mode and type when the VH is constant. There seems to be a lot of non-local reasoning involved. For example, you could have a var handle invoker created with `MethodHandless::varHandleInvoker`, which is cached, so the `AccessDescriptor` can be shared among many different use sites. For an individual use-site, the receiver VH may well be a constant, but that doesn't mean that the cache isn't polluted by the var handle from another use site, as far as I can tell. The thread safety issue comes from a C2 thread racing to read the `lastAdaption` cache vs another Java thread writing to the cache. AFAICS, this race is still possible even when `vh` is a compile time constant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2585100537 From mli at openjdk.org Wed Dec 3 13:27:42 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Dec 2025 13:27:42 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v30] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 03:44:27 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: > > - Merge branch 'openjdk:master' into aes_ctr > - modify label L_EXIT to L_exit > - add more comments for key value 52 > - update some comments, names and Pseudocode > - modify stub_id name > - Merge branch 'openjdk:master' into aes_ctr > - modify format > - add more comments > - modify parm to unsigned as aarch64 and x86 > - clean comments and format > - ... and 21 more: https://git.openjdk.org/jdk/compare/530493fe...98d802d5 Looks good, Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25281#pullrequestreview-3535032696 From mli at openjdk.org Wed Dec 3 13:27:43 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Dec 2025 13:27:43 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v29] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 10:59:13 GMT, Anjian Wen wrote: > I think maybe we can do this kind of optimization when we can test on a real machine later ? Great! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25281#discussion_r2585105649 From jvernee at openjdk.org Wed Dec 3 13:37:54 2025 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 3 Dec 2025 13:37:54 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <5QPAetQEkrBgFKtMt0i9Ku_4s2GCirMl2uqLH3j8x7g=.e5fc8964-0080-45f7-9005-31922ec06ba1@github.com> References: <7vA3xcZlxI6Z7C50Uopc-L4zaPa1opq-c-fy4ln34rQ=.7e4f1fd7-530b-41c1-8a04-9d024db31978@github.com> <5k9_zS-hTubx9WMd8lq30Ajq8xRDAjIEhmKaqnyrsCw=.09a5b646-6115-45f1-be39-f5a54b9dbdd4@github.com> <3OE37qXGHhLAhnRQM188hhygrLYBtI3FLBMK0tGVH30=.5d1b4406-3bb3-4788-8059-e78260b79ec1@github.com> <7WF8DlorrU_B2__G2wr43w1PZwJh8mEhD5dY10YDIOo=.ec416c38-1aff-4dd6-8792-d6a0e01f91ce@github.com> <_Z6KpxCYH2n3sHuT6-kRP4cSTAN3-s5UA0r bfrJSIgA=.e9d4089c-8329-406b-9a0a-167a24311c13@github.com> <5CADH75ZjadKttOKwsykRFUPlQKLiwCW8E5WkM_75a4=.fd992c8f-e8bc-4775-9ea3-d5212664e3df@github.com> <5QPAetQEkrBgFKtMt0i9Ku_4s2GCirMl2uqLH3j8x7g=.e5fc8964-0080-45f7-9005-31922ec06ba1@github.com> Message-ID: On Wed, 3 Dec 2025 13:23:18 GMT, Jorn Vernee wrote: >>> any captured adaptation is valid/compatible >> >> Yes, if `vh` is a constant, any captured adaptation from `vh.getMethodHandle(mode).asType(symbolicMethodTypeInvoker)` is valid/compatible. >> >> For thread safety, MethodHandle supports safe publication, so I think we are fine publishing this way. > > Looking at this, I'm not sure we can assume that we only see one mode and type when the VH is constant. There seems to be a lot of non-local reasoning involved. > > For example, you could have a var handle invoker created with `MethodHandless::varHandleInvoker`, which is cached, so the `AccessDescriptor` can be shared among many different use sites. For an individual use-site, the receiver VH may well be a constant, but that doesn't mean that the cache isn't polluted by the var handle from another use site, as far as I can tell. > > The thread safety issue comes from a C2 thread racing to read the `lastAdaption` cache vs another Java thread writing to the cache. AFAICS, this race is still possible even when `vh` is a compile time constant. I think even without using an invoker, you could end up in a similar situation if you have something like: static Object m(VarHandle vh) { return vh.get(); } Which is called by several different threads. At some point this method may be inlined into one of its callees, where `vh` then becomes a constant. But at the same time, other threads are still writing to the cache. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2585142665 From liach at openjdk.org Wed Dec 3 14:11:22 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Dec 2025 14:11:22 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: <4gKSL8hFAE2qSuTmhJa6JMfoB6JfUnK9fzwHAnH2Zzg=.9fc69461-bbe7-4242-b3b1-b4b004f35ce0@github.com> References: <4gKSL8hFAE2qSuTmhJa6JMfoB6JfUnK9fzwHAnH2Zzg=.9fc69461-bbe7-4242-b3b1-b4b004f35ce0@github.com> Message-ID: On Wed, 3 Dec 2025 10:31:07 GMT, Per Minborg wrote: >> MethodHandle is immutable and can be safely published. So this is ok. > > I meant that even though objects are immutable, plain semantics might not always do. > > Reference: https://shipilev.net/blog/2014/safe-public-construction/ MethodHandle is safe. All fields in Method Handle hierarchies are either lazy/stable or final. You can refer to the `invokers` field in `MethodType`, and the `MethodHandle` array in `Invokers` for precedents. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2585263336 From liach at openjdk.org Wed Dec 3 14:11:23 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Dec 2025 14:11:23 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v3] In-Reply-To: References: <4gKSL8hFAE2qSuTmhJa6JMfoB6JfUnK9fzwHAnH2Zzg=.9fc69461-bbe7-4242-b3b1-b4b004f35ce0@github.com> Message-ID: <4F_HqL-oY7z2ENI9yIAS7VS3NDjEljsqx4E2zK5HxJ0=.8f5ab39a-66e1-4ea3-ace2-226d6bd39d77@github.com> On Wed, 3 Dec 2025 14:06:00 GMT, Chen Liang wrote: >> I meant that even though objects are immutable, plain semantics might not always do. >> >> Reference: https://shipilev.net/blog/2014/safe-public-construction/ > > MethodHandle is safe. All fields in Method Handle hierarchies are either lazy/stable or final. You can refer to the `invokers` field in `MethodType`, and the `MethodHandle` array in `Invokers` for precedents. In extreme cases where a barrier is needed, java.lang.invoke already issue necessary barriers, most notably the storeStoreFence, such as https://github.com/openjdk/jdk/blob/135661b4389663b8c2e348d9e61e72cc628636bb/src/java.base/share/classes/java/lang/invoke/CallSite.java#L138 or https://github.com/openjdk/jdk/blob/135661b4389663b8c2e348d9e61e72cc628636bb/src/java.base/share/classes/java/lang/ClassValue.java#L411-L417 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2585270552 From vpaprotski at openjdk.org Wed Dec 3 14:57:24 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 3 Dec 2025 14:57:24 GMT Subject: RFR: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory [v2] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 15:40:51 GMT, Volodymyr Paprotski wrote: >> Requires a Broadwell machine, but was able to reproduce with an emulator: >> >> >> ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi >> ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > comment from Manuel Thanks for the approvals! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28588#issuecomment-3607240643 From vpaprotski at openjdk.org Wed Dec 3 14:57:25 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 3 Dec 2025 14:57:25 GMT Subject: Integrated: 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory In-Reply-To: References: Message-ID: On Mon, 1 Dec 2025 21:16:18 GMT, Volodymyr Paprotski wrote: > Requires a Broadwell machine, but was able to reproduce with an emulator: > > > ~/sde-external-9.58.0-2025-06-16-lin/sde64 -follow-subprocess -bdw -- ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:-UseMulAddIntrinsic -XX:+UseDilithiumIntrinsics -XX:+UnlockExperimentalVMOptions -XX:CodeCacheSegmentSi > ze=1024 -XX:CodeEntryAlignment=1024 -cp build/linux-x86_64-server-fastdebug/support/test/lib/test-lib.jar test/hotspot/jtreg/compiler/arguments/TestCodeEntryAlignment.java run This pull request has now been integrated. Changeset: 829b8581 Author: Volodymyr Paprotski URL: https://git.openjdk.org/jdk/commit/829b85813a3810eeecf6ce4b30b5c3d1fc34ad23 Stats: 3 lines in 2 files changed: 0 ins; 2 del; 1 mod 8372703: Test compiler/arguments/TestCodeEntryAlignment.java failed: assert(allocates2(pc)) failed: not in CodeBuffer memory Reviewed-by: mhaessig, dfenacci, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/28588 From liach at openjdk.org Wed Dec 3 15:54:30 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Dec 2025 15:54:30 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v5] In-Reply-To: References: Message-ID: > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Fix problem identified by Jorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/d49ad129..89e21b4b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=03-04 Stats: 40 lines in 3 files changed: 25 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From dlunden at openjdk.org Wed Dec 3 16:11:47 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 3 Dec 2025 16:11:47 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v8] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: <2z3W0Nhjk7JjhJSw8nTHZJhY6xd8j62vBHp0HkMNQJQ=.9a1f6cf7-ad8d-4599-a873-9b7729c7109b@github.com> On Mon, 1 Dec 2025 13:39:09 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Review comments resolutions Looks good @jatin-bhateja! One minor style consistency suggestion. test/hotspot/jtreg/compiler/vectorapi/TestABSMaskedMaxByteVector.java line 48: > 46: @Test > 47: @IR(failOn = {IRNode.ABS_VB}, applyIfAnd={"MaxVectorSize", " <= 8 ", "UseAVX", "0"}, applyIfPlatform={"x64", "true"}, applyIfCPUFeature={"sse4.1", "true"}) > 48: @IR(counts = {IRNode.ABS_VB, "1"}, applyIf={"MaxVectorSize", " > 8 "}, applyIfPlatform={"x64", "true"}, applyIfCPUFeature={"sse4.1", "true"}) Suggestion: @IR(failOn = {IRNode.ABS_VB}, applyIfAnd = {"MaxVectorSize", " <= 8 ", "UseAVX", "0"}, applyIfPlatform = {"x64", "true"}, applyIfCPUFeature = {"sse4.1", "true"}) @IR(counts = {IRNode.ABS_VB, "1"}, applyIf = {"MaxVectorSize", " > 8 "}, applyIfPlatform = {"x64", "true"}, applyIfCPUFeature = {"sse4.1", "true"}) ------------- Marked as reviewed by dlunden (Committer). PR Review: https://git.openjdk.org/jdk/pull/28533#pullrequestreview-3535790377 PR Review Comment: https://git.openjdk.org/jdk/pull/28533#discussion_r2585713416 From liach at openjdk.org Wed Dec 3 16:43:24 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Dec 2025 16:43:24 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v6] In-Reply-To: References: Message-ID: > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request incrementally with two additional commits since the last revision: - Test from Jorn - Copyright years ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/89e21b4b..ff7b3629 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=04-05 Stats: 107 lines in 3 files changed: 105 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From jbhateja at openjdk.org Wed Dec 3 18:30:50 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 3 Dec 2025 18:30:50 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v9] In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/vectorapi/TestABSMaskedMaxByteVector.java Co-authored-by: Daniel Lund?n ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28533/files - new: https://git.openjdk.org/jdk/pull/28533/files/ef84ffa7..e92cd467 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28533&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28533/head:pull/28533 PR: https://git.openjdk.org/jdk/pull/28533 From vlivanov at openjdk.org Wed Dec 3 19:39:56 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 3 Dec 2025 19:39:56 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v6] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 16:43:24 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with two additional commits since the last revision: > > - Test from Jorn > - Copyright years src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2036: > 2034: // from two writes (they must not be tearable) > 2035: private record Adaption(VarHandle vh, MethodHandle mh) {} > 2036: private @Stable Adaption adaption; Is a soft reference needed here? The situation looks similar to `MH.asTypeSoftCache`. It can keep some classes referred by `vh` alive for unnecessarily long. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2586374014 From kvn at openjdk.org Wed Dec 3 19:53:25 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Dec 2025 19:53:25 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:31:22 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - More comments > - Tighten up the comments > - Simplify third case: no need to loop, just restart the search > - Actually have a second "fast" case: receiver is not found in the table, and the table is full > - Pushing/popping for rare CAS path is counter-productive > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Tighten up some more > - Offset is always rscratch1, no need to save it > - Grossly simplify register shuffling > - ... and 11 more: https://git.openjdk.org/jdk/compare/7278d2e8...3c5019d9 This looks good. Thank you for cleaning up code and detailed comments. I submitted our testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3608573720 From vlivanov at openjdk.org Wed Dec 3 21:52:29 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 3 Dec 2025 21:52:29 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v3] In-Reply-To: References: Message-ID: > Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. > > There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. > > The difference can be illustrated with the following simple cases: > > class A { void m() {} } > class B extends A { void m() {} } > > void testInstanceOf(A obj) { > if (obj instanceof B) { > obj.m(); > } > } > > InstanceOf::testInstanceOf (12 bytes) > @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call > > vs > > void testInstanceOfCast(A obj) { > if (obj instanceof B) { > B b = (B)obj; > b.m(); > } > } > > InstanceOf::testInstanceOfCast (17 bytes) > @ 13 InstanceOf$B::m (1 bytes) inline (hot) > > > Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. > > FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. > > Testing: hs-tier1 - hs-tier5 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Unify Compile::should_delay_inlining ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28517/files - new: https://git.openjdk.org/jdk/pull/28517/files/0a5e78c6..c58c63cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=01-02 Stats: 12 lines in 4 files changed: 2 ins; 7 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28517.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28517/head:pull/28517 PR: https://git.openjdk.org/jdk/pull/28517 From vlivanov at openjdk.org Wed Dec 3 21:58:24 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 3 Dec 2025 21:58:24 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v4] In-Reply-To: References: Message-ID: > Even though `instanceof` check (and reflective `Class.isInstance` call) narrows operand's type, sharpened type information is not explicitly materialized in the IR. > > There's a `SubTypeCheck` node present, but it is not a substitute for a `CheckCastPP` node with a proper type. > > The difference can be illustrated with the following simple cases: > > class A { void m() {} } > class B extends A { void m() {} } > > void testInstanceOf(A obj) { > if (obj instanceof B) { > obj.m(); > } > } > > InstanceOf::testInstanceOf (12 bytes) > @ 8 InstanceOf$A::m (0 bytes) failed to inline: virtual call > > vs > > void testInstanceOfCast(A obj) { > if (obj instanceof B) { > B b = (B)obj; > b.m(); > } > } > > InstanceOf::testInstanceOfCast (17 bytes) > @ 13 InstanceOf$B::m (1 bytes) inline (hot) > > > Proposed fix annotates operands of subtype checks with proper type information which reflects the effects of subtype check. Not-yet-canonicalized IR shape poses some challenges, but I decided to match it early so information is available right away, rather than waiting for IGVN pass and delay inlining to post-parse phase. > > FTR it is not a complete fix. It works for trivial cases, but for more complex conditions the IR shape becomes too complex during parsing (as illustrated by some test cases). I experimented with annotating subtype checks after initial parsing pass is over, but the crucial simplification step happens as part of split-if transformation which happens when no more inlining is possible. So, the only possible benefit (without forcing split-if optimization earlier) is virtual-to-direct call strength reduction. I plan to explore it separately. > > Testing: hs-tier1 - hs-tier5 Vladimir Ivanov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into c2.instanceof - Unify Compile::should_delay_inlining - Test fix - bugid - C2: Materialize type information from instanceof checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28517/files - new: https://git.openjdk.org/jdk/pull/28517/files/c58c63cc..58a7d521 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28517&range=02-03 Stats: 98149 lines in 1639 files changed: 63706 ins; 23937 del; 10506 mod Patch: https://git.openjdk.org/jdk/pull/28517.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28517/head:pull/28517 PR: https://git.openjdk.org/jdk/pull/28517 From liach at openjdk.org Wed Dec 3 23:44:01 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 3 Dec 2025 23:44:01 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v6] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 19:37:25 GMT, Vladimir Ivanov wrote: >> Chen Liang has updated the pull request incrementally with two additional commits since the last revision: >> >> - Test from Jorn >> - Copyright years > > src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2036: > >> 2034: // from two writes (they must not be tearable) >> 2035: private record Adaption(VarHandle vh, MethodHandle mh) {} >> 2036: private @Stable Adaption adaption; > > Is a soft reference needed here? The situation looks similar to `MH.asTypeSoftCache`. It can keep some classes referred by `vh` alive for unnecessarily long. I don't think we can use a SoftReference here if we need to achieve constant folding. Looking at inline_reference_get0, I think we might introduce another field property to trust a reference (potentially in an array) if both that reference and the referent within the reference is non-null. I think that belongs to a separate RFE. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2586946357 From sviswanathan at openjdk.org Thu Dec 4 00:15:18 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 4 Dec 2025 00:15:18 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v9] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Wed, 3 Dec 2025 18:30:50 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/vectorapi/TestABSMaskedMaxByteVector.java > > Co-authored-by: Daniel Lund?n Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28533#pullrequestreview-3537389980 From liach at openjdk.org Thu Dec 4 01:48:31 2025 From: liach at openjdk.org (Chen Liang) Date: Thu, 4 Dec 2025 01:48:31 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v7] In-Reply-To: References: Message-ID: <7ayMTZ4nXMyB1SXNRcYGjdxidNHDcAUNv_8fQZDUaPI=.a558d3a2-1d3e-4b45-8ba7-393c55a52785@github.com> > Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) Chen Liang has updated the pull request incrementally with one additional commit since the last revision: Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28585/files - new: https://git.openjdk.org/jdk/pull/28585/files/ff7b3629..8200fb28 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28585&range=05-06 Stats: 23 lines in 1 file changed: 20 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28585/head:pull/28585 PR: https://git.openjdk.org/jdk/pull/28585 From xgong at openjdk.org Thu Dec 4 01:49:37 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 4 Dec 2025 01:49:37 GMT Subject: RFR: 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed Message-ID: **Problem:** This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { const TypeVect* vt = vect_type(); if (Matcher::vector_needs_partial_operations(this, vt)) { return VectorNode::try_to_gen_masked_vector(phase, this, vt); } return LoadNode::Ideal(phase, can_reshape); } The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. **Solution:** This patch addresses the issue through two changes: 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. **Testing:** - Verified on different SVE platforms with different vector sizes (128|256|512 bits). - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. ------------- Commit messages: - 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed Changes: https://git.openjdk.org/jdk/pull/28651/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28651&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8371603 Stats: 619 lines in 8 files changed: 577 ins; 15 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/28651.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28651/head:pull/28651 PR: https://git.openjdk.org/jdk/pull/28651 From xgong at openjdk.org Thu Dec 4 01:49:37 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 4 Dec 2025 01:49:37 GMT Subject: RFR: 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed In-Reply-To: References: Message-ID: <6hTYNHBCAdNtrpIHUMQFQtGF3pgL_zEHllk3pa8VO5w=.633da968-84cf-4312-83ca-250941aaab5f@github.com> On Thu, 4 Dec 2025 01:41:19 GMT, Xiaohong Gong wrote: > **Problem:** > > This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: > > > Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { > const TypeVect* vt = vect_type(); > if (Matcher::vector_needs_partial_operations(this, vt)) { > return VectorNode::try_to_gen_masked_vector(phase, this, vt); > } > return LoadNode::Ideal(phase, can_reshape); > } > > > The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. > > This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Solution:** > > This patch addresses the issue through two changes: > > 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. > 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Testing:** > > - Verified on different SVE platforms with different vector sizes (128|256|512 bits). > - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). > - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. Hi @eme64 , this is the fixing for the crash issue reported on aws machine. Could you please help take a look? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28651#issuecomment-3609606962 From vlivanov at openjdk.org Thu Dec 4 01:58:56 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 4 Dec 2025 01:58:56 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v6] In-Reply-To: References: Message-ID: <8tu3HIArCw2cdoYR2SjI0b-TWYQxQLKkjQgucJEj8D4=.10946ec2-4958-48df-add4-b29d11c09448@github.com> On Wed, 3 Dec 2025 23:41:01 GMT, Chen Liang wrote: >> src/java.base/share/classes/java/lang/invoke/VarHandle.java line 2036: >> >>> 2034: // from two writes (they must not be tearable) >>> 2035: private record Adaption(VarHandle vh, MethodHandle mh) {} >>> 2036: private @Stable Adaption adaption; >> >> Is a soft reference needed here? The situation looks similar to `MH.asTypeSoftCache`. It can keep some classes referred by `vh` alive for unnecessarily long. > > I don't think we can use a SoftReference here if we need to achieve constant folding. > > Looking at inline_reference_get0, I think we might introduce another field property to trust a reference (potentially in an array) if both that reference and the referent within the reference is non-null. I think that belongs to a separate RFE. What do you think? Then it makes sense to limit the caching to safe cases only for now. Otherwise, it would functionally regress due to a possible memory leak. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2587156042 From dlong at openjdk.org Thu Dec 4 03:24:59 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 4 Dec 2025 03:24:59 GMT Subject: RFR: 8350208: CTW: GraphKit::add_safepoint_edges asserts "not enough operands for reexecution" In-Reply-To: References: Message-ID: <29opt6wmJqMaeTB-QbJWTMCndjSQH0ZhbTaD-ir6X4A=.b728d464-b7f1-44c5-a76c-c84f84f150f5@github.com> On Tue, 2 Dec 2025 10:30:46 GMT, Quan Anh Mai wrote: > Hi, > > This PR fixes the issue of the compiler crashing with "not enough operands for reexecution". The issue here is that during `Parse::catch_inline_exceptions`, the old stack is gone, and we cannot reexecute the current bytecode anymore. However, there are some places where we try to insert safepoints into the graph, such as if the handler is a backward jump, or if one of the exceptions in the handlers is not loaded. Since the `_reexecute` state of the current jvms is "undefined", it is inferred automatically that it should reexecute for some bytecodes such as `putfield`. The solution then is to explicitly set `_reexecute` to false. > > I can manage to write a unit test for the case of a backward handler, for the other cases, since the exceptions that can be thrown for a bytecode that is inferred to reexecute are `NullPointerException`, `ArrayIndexOutOfBoundsException`, and `ArrayStoreException`. I find it hard to construct such a test in which one of them is not loaded. > > Please kindly review, thanks a lot. It seems to be very difficult to force the back-edge safepoint to deoptimize. I tried creating a thread that calls System.gc(), but so far no crash. Still, I think the state is incorrect if reexecute=false. Setting reexecute to false means it will skip the current instruction. To correctly handle a deoptimization on the backwards branch, the debug state, bci, and exception location should match. I think we have 3 choices to prepare for maybe_add_safepoint(): 1. preserve stack inputs, use original bci, do not push exception oop, let interpreter reexecute and throw the exception (reexecute=true) This might be as simple as reversing the order of calls to push_ex_oop and maybe_add_safepoint. 2. trim stack, push exception object, use bci of exception handler (reexecute=true) This would require temporarily changing the bci for the maybe_add_safepoint call. 3. trim stack, throw exception (move to Thread) (reexecute=true) This requires extra unconditional overhead even though safepoint rarely happens. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28597#issuecomment-3609883724 From duke at openjdk.org Thu Dec 4 05:04:32 2025 From: duke at openjdk.org (Harshit470250) Date: Thu, 4 Dec 2025 05:04:32 GMT Subject: RFR: 8370920: [s390] C2: add instruction size in s390.ad file [v9] In-Reply-To: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> References: <6L13GD9fUG60AH8_WoSTY-o0TW6p3iXG2TI2o6oQltE=.41cc9b1a-65cf-49ed-9cb7-37014cd681c6@github.com> Message-ID: > This pr adds the size of the match rule nodes. > > There were a lot of nodes for which the size was variable, for those node I have taken the maximum possible size. Harshit470250 has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 19 additional commits since the last revision: - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge remote-tracking branch 'origin/master' - remove whitespace - ... and 9 more: https://git.openjdk.org/jdk/compare/39a4f5df...05c649cb ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28054/files - new: https://git.openjdk.org/jdk/pull/28054/files/077d0258..05c649cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28054&range=07-08 Stats: 29485 lines in 697 files changed: 16799 ins; 8755 del; 3931 mod Patch: https://git.openjdk.org/jdk/pull/28054.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28054/head:pull/28054 PR: https://git.openjdk.org/jdk/pull/28054 From fyang at openjdk.org Thu Dec 4 05:30:01 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 4 Dec 2025 05:30:01 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v30] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 03:44:27 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: > > - Merge branch 'openjdk:master' into aes_ctr > - modify label L_EXIT to L_exit > - add more comments for key value 52 > - update some comments, names and Pseudocode > - modify stub_id name > - Merge branch 'openjdk:master' into aes_ctr > - modify format > - add more comments > - modify parm to unsigned as aarch64 and x86 > - clean comments and format > - ... and 21 more: https://git.openjdk.org/jdk/compare/530493fe...98d802d5 Still good me. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25281#pullrequestreview-3538186330 From haosun at openjdk.org Thu Dec 4 05:34:54 2025 From: haosun at openjdk.org (Hao Sun) Date: Thu, 4 Dec 2025 05:34:54 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 13:11:27 GMT, Ramkumar Sunderbabu wrote: > Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. > MD5 > SHA1 > SHA256 > SHA3 > > Testing: > All flag combinations from CI > hotspot tiers 1 to 5 > PS: only for tier testings, mac-aarch was skipped due to resource constraints Thanks for your work. I suppose the **os.arch** requires condition can be removed in the following cases: diff --git a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java index eeff351f737..3561be3b33b 100644 --- a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java +++ b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java @@ -25,10 +25,8 @@ * @test * @bug 8035968 * @summary Verify UseMD5Intrinsics option processing on supported CPU. - * ( Disable this test on riscv, because on riscv UseMD5Intrinsics depends on !AvoidUnalignedAccesses. ) * @library /test/lib / * @requires vm.flagless - * @requires os.arch != "riscv64" * * @build jdk.test.whitebox.WhiteBox * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox diff --git a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java index 1ce2c4b1f87..71ed3b3cac9 100644 --- a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java +++ b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java @@ -25,10 +25,8 @@ * @test * @bug 8035968 * @summary Verify UseSHA1Intrinsics option processing on supported CPU. - * ( Disable this test on riscv, because on riscv UseSHA1Intrinsics depends on !AvoidUnalignedAccesses. ) * @library /test/lib / * @requires vm.flagless - * @requires os.arch != "riscv64" * * @build jdk.test.whitebox.WhiteBox * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox diff --git a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java index d3c0a4a8da7..41a2ec277a2 100644 --- a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java +++ b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java @@ -28,10 +28,8 @@ * @summary Verify UseSHA3Intrinsics option processing on supported CPU. * @library /test/lib / * @requires vm.flagless - * @requires os.arch == "aarch64" & os.family == "mac" + * @requires os.arch == "aarch64" * @comment sha3 is only implemented on AArch64 for now. - * UseSHA3Intrinsics is only auto-enabled on Apple silicon, because it - * may introduce performance regression on others. See JDK-8297092. * * @build jdk.test.whitebox.WhiteBox * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox I checked on Nvidia Grace machine with the above patch. `TestUseSHA3IntrinsicsOptionOnSupportedCPU.java` can pass. If this patch is fine to you, we'd better run the tests on ricsv64 for safety. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3610316715 From fyang at openjdk.org Thu Dec 4 05:52:55 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 4 Dec 2025 05:52:55 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 05:32:31 GMT, Hao Sun wrote: >> Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. >> MD5 >> SHA1 >> SHA256 >> SHA3 >> >> Testing: >> All flag combinations from CI >> hotspot tiers 1 to 5 >> PS: only for tier testings, mac-aarch was skipped due to resource constraints > > Thanks for your work. > > I suppose the **os.arch** requires condition can be removed in the following cases: > > > diff --git a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java > index eeff351f737..3561be3b33b 100644 > --- a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java > +++ b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseMD5IntrinsicsOptionOnSupportedCPU.java > @@ -25,10 +25,8 @@ > * @test > * @bug 8035968 > * @summary Verify UseMD5Intrinsics option processing on supported CPU. > - * ( Disable this test on riscv, because on riscv UseMD5Intrinsics depends on !AvoidUnalignedAccesses. ) > * @library /test/lib / > * @requires vm.flagless > - * @requires os.arch != "riscv64" > * > * @build jdk.test.whitebox.WhiteBox > * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox > diff --git a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java > index 1ce2c4b1f87..71ed3b3cac9 100644 > --- a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java > +++ b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA1IntrinsicsOptionOnSupportedCPU.java > @@ -25,10 +25,8 @@ > * @test > * @bug 8035968 > * @summary Verify UseSHA1Intrinsics option processing on supported CPU. > - * ( Disable this test on riscv, because on riscv UseSHA1Intrinsics depends on !AvoidUnalignedAccesses. ) > * @library /test/lib / > * @requires vm.flagless > - * @requires os.arch != "riscv64" > * > * @build jdk.test.whitebox.WhiteBox > * @run driver jdk.test.lib.helpers.ClassFileInstaller jdk.test.whitebox.WhiteBox > diff --git a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java > index d3c0a4a8da7..41a2ec277a2 100644 > --- a/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java > +++ b/test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java > @@ -28,10 +28,8 @@ > * @summary Verify UseSHA3Intrinsics option processing on supported CPU. > * @library /test/lib / > * @requires vm.flagless > - * @requires os.arch == "aarch64" &... @shqking : I did a quick try on riscv64 and I see your add-on fix works as well. Good cleanup! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3610395107 From jbhateja at openjdk.org Thu Dec 4 07:18:59 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 4 Dec 2025 07:18:59 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v8] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Wed, 3 Dec 2025 06:30:06 GMT, Emanuel Peter wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Review comments resolutions > > Oh, a second review would be required though! @eme64 , seem due to some hickups PR is not marked ready for integration, kindly re-approve. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28533#issuecomment-3610638541 From epeter at openjdk.org Thu Dec 4 08:26:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Dec 2025 08:26:03 GMT Subject: RFR: 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 01:41:19 GMT, Xiaohong Gong wrote: > **Problem:** > > This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: > > > Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { > const TypeVect* vt = vect_type(); > if (Matcher::vector_needs_partial_operations(this, vt)) { > return VectorNode::try_to_gen_masked_vector(phase, this, vt); > } > return LoadNode::Ideal(phase, can_reshape); > } > > > The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. > > This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Solution:** > > This patch addresses the issue through two changes: > > 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. > 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Testing:** > > - Verified on different SVE platforms with different vector sizes (128|256|512 bits). > - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). > - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. @XiaohongGong Thanks for taking this over from me and fixing this so quickly, much appreciated! Thanks for adding my regression tests and for the attribution :) I only have a minor comment below. Otherwise, the code looks good to me. But since I'm not an SVE specialist, it would be good if someone with deeper knowledge would do a deeper review of the specific SVE parts. Once an SVE specialist gives the approval for the PR, I'l run some internal testing and approve from my side :) Ah, and one more thing: you should change the PR title to be more descriptive of the issue. The assert that was hit is only a far removed symptom. I would suggest: `C2 SVE: missing Ideal optimizations for load and store vectors` src/hotspot/share/opto/vectornode.cpp line 1118: > 1116: if (Matcher::vector_needs_partial_operations(this, vt)) { > 1117: return VectorNode::gen_masked_vector(phase, this, vt); > 1118: } I think it would still be good practice to expect that a `nullptr` could come from `gen_masked_vector`, and then continue with optimizations below, rather than just returning the `nullptr`. Because: who knows what someone in the future might do inside `gen_masked_vector`, maybe they'll find some edge case and just return `nullptr` again, and then we are back to similar issues. ------------- PR Review: https://git.openjdk.org/jdk/pull/28651#pullrequestreview-3538639579 PR Review Comment: https://git.openjdk.org/jdk/pull/28651#discussion_r2587987772 From epeter at openjdk.org Thu Dec 4 08:30:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Dec 2025 08:30:58 GMT Subject: RFR: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 [v9] In-Reply-To: References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Wed, 3 Dec 2025 18:30:50 GMT, Jatin Bhateja wrote: >> This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. >> >> Its better to reject matching of VectorBlend in such a scenario. >> >> All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/vectorapi/TestABSMaskedMaxByteVector.java > > Co-authored-by: Daniel Lund?n Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/28533#pullrequestreview-3538701546 From roland at openjdk.org Thu Dec 4 08:53:32 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 4 Dec 2025 08:53:32 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v8] In-Reply-To: References: Message-ID: > In test cases, `mh` is initially not constant so the method handle > invoke can't be inlined. It is later found to be constant, so it can > be turned into a direct call by > `Compile::process_late_inline_calls_no_inline()`. In the meantime, the > `CallNode` for the mh invoke is cloned (by loop switching). In the > process, only a shallow copy of the `JVMState` for the call is > made. The initial `CallNode` is the first to be processed by > `Compile::process_late_inline_calls_no_inline()` and that causes that > `CallNode` to become dead. The cloned `CallNode` is then > processed. The `JVMState` for that one references the initial > `CallNode` in its caller's `JVMState`. Because that node is dead, that > causes a crash. The fix I propose is to make a deep copy of the > `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is > assigned to the node. > > The other failure I see with these tests is: > > > # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 > # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! > > > because even though the `CallNode` is cloned, there's still only one > late inline recorded. The fix here is to increment > `_number_of_mh_late_inlines` when the node is cloned. > > This was reported by the netty developers. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/compile.hpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28088/files - new: https://git.openjdk.org/jdk/pull/28088/files/64b11e6e..124b1f69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28088&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/28088.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28088/head:pull/28088 PR: https://git.openjdk.org/jdk/pull/28088 From xgong at openjdk.org Thu Dec 4 10:19:16 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 4 Dec 2025 10:19:16 GMT Subject: RFR: 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 08:08:14 GMT, Emanuel Peter wrote: >> **Problem:** >> >> This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: >> >> >> Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> const TypeVect* vt = vect_type(); >> if (Matcher::vector_needs_partial_operations(this, vt)) { >> return VectorNode::try_to_gen_masked_vector(phase, this, vt); >> } >> return LoadNode::Ideal(phase, can_reshape); >> } >> >> >> The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. >> >> This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Solution:** >> >> This patch addresses the issue through two changes: >> >> 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. >> 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Testing:** >> >> - Verified on different SVE platforms with different vector sizes (128|256|512 bits). >> - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). >> - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. > > src/hotspot/share/opto/vectornode.cpp line 1118: > >> 1116: if (Matcher::vector_needs_partial_operations(this, vt)) { >> 1117: return VectorNode::gen_masked_vector(phase, this, vt); >> 1118: } > > I think it would still be good practice to expect that a `nullptr` could come from `gen_masked_vector`, and then continue with optimizations below, rather than just returning the `nullptr`. > > Because: who knows what someone in the future might do inside `gen_masked_vector`, maybe they'll find some edge case and just return `nullptr` again, and then we are back to similar issues. I see your concern. Make sense to me. Thanks! I'd like keep current implementation of `Matcher::vector_needs_partial_operations` and `gen_masked_vector` because as we discussed in the previous PR that this sounds more reasonable. I will update the caller code here to check `nullptr` in addition although it won't generate a `nullptr` now. Code may look like: if (Matcher::vector_needs_partial_operations(this, vt)) { Node* n = VectorNode::gen_masked_vector(phase, this, vt); if (n != nullptr) { return n; } } return ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28651#discussion_r2588404849 From jbhateja at openjdk.org Thu Dec 4 10:20:26 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 4 Dec 2025 10:20:26 GMT Subject: Integrated: 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 In-Reply-To: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> References: <8XYX6osvEhiKn4rdAe_lMOKwNLda6y_JGIF-5cwquIc=.d1e0a0c3-7f5c-429d-8e00-c2240f722ad1@github.com> Message-ID: On Thu, 27 Nov 2025 12:56:08 GMT, Jatin Bhateja wrote: > This bug patch fixes a crash seen while querying the bottom type of MachTempNode corresponding to [rxmm0 operand](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L22509) of blend pattern during late scheduling. Here, MaxVectorSize is contrainted to 8 bytes thus during C2 type system initialization, [TypeVect::VECTX ](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/type.cpp#L719) guarded by target supprted vector size remains uninitialized. > > Its better to reject matching of VectorBlend in such a scenario. > > All exisitng VectorAPI jtreg tests are passing with -XX:UseAVX=0 and -XX:MaxVectorSize=8 > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 91c5bd55 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/91c5bd550a36e10e8b39d1b322fd433ee8df14f5 Stats: 59 lines in 2 files changed: 59 ins; 0 del; 0 mod 8337791: VectorAPI jtreg ABSMaskedByteMaxVectorTests crashes with UseAVX=0 -XX:MaxVectorSize=8 Reviewed-by: epeter, sviswanathan, dlunden ------------- PR: https://git.openjdk.org/jdk/pull/28533 From xgong at openjdk.org Thu Dec 4 10:23:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 4 Dec 2025 10:23:07 GMT Subject: RFR: 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 08:23:31 GMT, Emanuel Peter wrote: > Ah, and one more thing: you should change the PR title to be more descriptive of the issue. The assert that was hit is only a far removed symptom. I would suggest: > > `C2 SVE: missing Ideal optimizations for load and store vectors` The new title looks good to me. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28651#issuecomment-3611341132 From xgong at openjdk.org Thu Dec 4 10:28:00 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 4 Dec 2025 10:28:00 GMT Subject: RFR: 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 10:16:32 GMT, Xiaohong Gong wrote: >> src/hotspot/share/opto/vectornode.cpp line 1118: >> >>> 1116: if (Matcher::vector_needs_partial_operations(this, vt)) { >>> 1117: return VectorNode::gen_masked_vector(phase, this, vt); >>> 1118: } >> >> I think it would still be good practice to expect that a `nullptr` could come from `gen_masked_vector`, and then continue with optimizations below, rather than just returning the `nullptr`. >> >> Because: who knows what someone in the future might do inside `gen_masked_vector`, maybe they'll find some edge case and just return `nullptr` again, and then we are back to similar issues. > > I see your concern. Make sense to me. Thanks! I'd like keep current implementation of `Matcher::vector_needs_partial_operations` and `gen_masked_vector` because as we discussed in the previous PR that this sounds more reasonable. > > I will update the caller code here to check `nullptr` in addition although it won't generate a `nullptr` now. Code may look like: > > if (Matcher::vector_needs_partial_operations(this, vt)) { > Node* n = VectorNode::gen_masked_vector(phase, this, vt); > if (n != nullptr) { > return n; > } > } > return ... Is it better that we add an assertion of non `nullptr` value before returning in `gen_masked_vector` , consider this might make the caller code clean? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28651#discussion_r2588435959 From epeter at openjdk.org Thu Dec 4 12:13:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Dec 2025 12:13:00 GMT Subject: RFR: 8371603: C2: assert(_inputs.at(alias_idx) == nullptr || _inputs.at(alias_idx) == load->in(1)) failed In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 10:25:03 GMT, Xiaohong Gong wrote: >> I see your concern. Make sense to me. Thanks! I'd like keep current implementation of `Matcher::vector_needs_partial_operations` and `gen_masked_vector` because as we discussed in the previous PR that this sounds more reasonable. >> >> I will update the caller code here to check `nullptr` in addition although it won't generate a `nullptr` now. Code may look like: >> >> if (Matcher::vector_needs_partial_operations(this, vt)) { >> Node* n = VectorNode::gen_masked_vector(phase, this, vt); >> if (n != nullptr) { >> return n; >> } >> } >> return ... > > Is it better that we add an assertion of non `nullptr` value before returning in `gen_masked_vector` , consider this might make the caller code clean? If you do the assert inside the method, then later someone may just do `return nullptr` somewhere, and your assert won't catch it, right? I would just do this: if (Matcher::vector_needs_partial_operations(this, vt)) { Node* n = VectorNode::gen_masked_vector(phase, this, vt); if (n != nullptr) { return n; } } Or you could even combine the methods `vector_needs_partial_operations` and `gen_masked_vector` into some `Ideal_partial_operations`: Node* progress = VectorNode::Ideal_partial_operations(phase, vt, this); if (progress != nullptr) { return progress; } That would remove the most clutter from the caller method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28651#discussion_r2588819337 From roland at openjdk.org Thu Dec 4 12:55:06 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 4 Dec 2025 12:55:06 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v6] In-Reply-To: <-kd-AfwkJebk8njImn0KeKvUCQnwoiqLr96cKCovlFc=.30649d16-8dee-4c9d-b1eb-ac9d7e9df86a@github.com> References: <7nY7QRkkFjOtOuBXID1I4GluA0vnFRLy_UnRICfVkR4=.99ec7fe1-af27-4ab7-ac63-27aa12bec4ef@github.com> <-kd-AfwkJebk8njImn0KeKvUCQnwoiqLr96cKCovlFc=.30649d16-8dee-4c9d-b1eb-ac9d7e9df86a@github.com> Message-ID: <8ZViC6KgwXNMreHupSs6CDUMYRhFOm0bZrkSqB4Jj0A=.ad3d56e8-4e0a-4e8d-b40c-2a8bd4627ca7@github.com> On Tue, 25 Nov 2025 22:45:53 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8370939 >> - review >> - Merge branch 'master' into JDK-8370939 >> - review >> - more >> - more >> - more >> - more >> - test >> - fix > > Sure, I'm fine either way. There are known cases when `dec_number_of_mh_late_inlines()` call is missing, so the patch as it is now looks fine as well considering we'll investigate the effects on `inline_string_calls()` call. @iwanowww @TobiHartmann thanks for the reviews and testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28088#issuecomment-3612111820 From roland at openjdk.org Thu Dec 4 13:04:59 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 4 Dec 2025 13:04:59 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v7] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 07:21:14 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8370939 >> - Merge branch 'master' into JDK-8370939 >> - review >> - Merge branch 'master' into JDK-8370939 >> - review >> - more >> - more >> - more >> - more >> - test >> - ... and 1 more: https://git.openjdk.org/jdk/compare/854b6c58...64b11e6e > > All testing passed. @TobiHartmann @iwanowww since I included Tobias' suggestion, I need one of you to approve the change again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28088#issuecomment-3612151058 From dbriemann at openjdk.org Thu Dec 4 13:06:09 2025 From: dbriemann at openjdk.org (David Briemann) Date: Thu, 4 Dec 2025 13:06:09 GMT Subject: RFR: 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled Message-ID: ?all and UseTransparentHugePages is enabled Aligning upwards instead of downwards not only solves the crash in large huge page scenarios but also ensures that the cache sizes are at least as big as they were set. ------------- Commit messages: - 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled Changes: https://git.openjdk.org/jdk/pull/28658/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28658&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8372589 Stats: 10 lines in 1 file changed: 0 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/28658.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28658/head:pull/28658 PR: https://git.openjdk.org/jdk/pull/28658 From mdoerr at openjdk.org Thu Dec 4 14:57:39 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 4 Dec 2025 14:57:39 GMT Subject: RFR: 8372589: VM crashes on init when NonNMethodCodeHeapSize is set too small and UseTransparentHugePages is enabled In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 12:59:03 GMT, David Briemann wrote: > Aligning upwards instead of downwards not only solves the crash in large huge page scenarios but also ensures that the cache sizes are at least as big as they were set. This make sense. Please make sure to commit it when JDK27 is started in head and after a 2nd review. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28658#pullrequestreview-3540432282 From thartmann at openjdk.org Thu Dec 4 15:07:40 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Dec 2025 15:07:40 GMT Subject: RFR: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() [v8] In-Reply-To: References: Message-ID: <5KzZFCSXDZaqFIZEeirQNFrkDUDERHXCein3swQsSqc=.452f96e2-787b-48a2-a87a-4869f26f9075@github.com> On Thu, 4 Dec 2025 08:53:32 GMT, Roland Westrelin wrote: >> In test cases, `mh` is initially not constant so the method handle >> invoke can't be inlined. It is later found to be constant, so it can >> be turned into a direct call by >> `Compile::process_late_inline_calls_no_inline()`. In the meantime, the >> `CallNode` for the mh invoke is cloned (by loop switching). In the >> process, only a shallow copy of the `JVMState` for the call is >> made. The initial `CallNode` is the first to be processed by >> `Compile::process_late_inline_calls_no_inline()` and that causes that >> `CallNode` to become dead. The cloned `CallNode` is then >> processed. The `JVMState` for that one references the initial >> `CallNode` in its caller's `JVMState`. Because that node is dead, that >> causes a crash. The fix I propose is to make a deep copy of the >> `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is >> assigned to the node. >> >> The other failure I see with these tests is: >> >> >> # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 >> # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! >> >> >> because even though the `CallNode` is cloned, there's still only one >> late inline recorded. The fix here is to increment >> `_number_of_mh_late_inlines` when the node is cloned. >> >> This was reported by the netty developers. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/compile.hpp > > Co-authored-by: Tobias Hartmann Still good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28088#pullrequestreview-3540481333 From roland at openjdk.org Thu Dec 4 15:28:55 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 4 Dec 2025 15:28:55 GMT Subject: Integrated: 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() In-Reply-To: References: Message-ID: On Fri, 31 Oct 2025 16:39:07 GMT, Roland Westrelin wrote: > In test cases, `mh` is initially not constant so the method handle > invoke can't be inlined. It is later found to be constant, so it can > be turned into a direct call by > `Compile::process_late_inline_calls_no_inline()`. In the meantime, the > `CallNode` for the mh invoke is cloned (by loop switching). In the > process, only a shallow copy of the `JVMState` for the call is > made. The initial `CallNode` is the first to be processed by > `Compile::process_late_inline_calls_no_inline()` and that causes that > `CallNode` to become dead. The cloned `CallNode` is then > processed. The `JVMState` for that one references the initial > `CallNode` in its caller's `JVMState`. Because that node is dead, that > causes a crash. The fix I propose is to make a deep copy of the > `JVMState` when a `CallNode` is cloned, if a `CallGenerator` is > assigned to the node. > > The other failure I see with these tests is: > > > # Internal Error (/home/roland/jdk-jdk/src/hotspot/share/opto/compile.hpp:1091), pid=3319164, tid=3319186 > # assert(_number_of_mh_late_inlines > 0) failed: _number_of_mh_late_inlines < 0 ! > > > because even though the `CallNode` is cloned, there's still only one > late inline recorded. The fix here is to increment > `_number_of_mh_late_inlines` when the node is cloned. > > This was reported by the netty developers. This pull request has now been integrated. Changeset: 27351401 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/2735140147b159d3a3238804f221db4f835ef744 Stats: 125 lines in 6 files changed: 113 ins; 3 del; 9 mod 8370939: C2: SIGSEGV in SafePointNode::verify_input when processing MH call from Compile::process_late_inline_calls_no_inline() Reviewed-by: thartmann, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/28088 From vlivanov at openjdk.org Thu Dec 4 19:17:29 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 4 Dec 2025 19:17:29 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: On Tue, 2 Dec 2025 10:31:22 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - More comments > - Tighten up the comments > - Simplify third case: no need to loop, just restart the search > - Actually have a second "fast" case: receiver is not found in the table, and the table is full > - Pushing/popping for rare CAS path is counter-productive > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Tighten up some more > - Offset is always rscratch1, no need to save it > - Grossly simplify register shuffling > - ... and 11 more: https://git.openjdk.org/jdk/compare/7278d2e8...3c5019d9 Overall, looks good to me. Nice work, Aleksey! I'm curious how performance-sensitive that part of code is. Does it make sense to try to further optimize it? For example: - 2 slots is the most common case; any benefits from optimizing specifically for it (e.g., unroll the loops)? - fast path can be further optimized for no nulls case by offloading more work on found_null slow path [1] [1] // Fastest: receiver is already installed int i = 0; for (; i < receiver_count(); i++) { if (receiver(i) == recv) goto found_recv(i); if (receiver(i) == null) goto found_null(i); } goto polymorphic // Slow: try to install receiver found_null(i): // Finish the search for (int j = i ; j < receiver_count(); j++) { if (receiver(j) == recv) goto found_recv(j); } CAS(&receiver(i), null, recv); goto restart ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3613949570 From kvn at openjdk.org Thu Dec 4 21:49:06 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 4 Dec 2025 21:49:06 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: <463CgW4WDJnmLWha1DJLYIWw_UEh4ba9vdeQq80QfJM=.08d3125b-f5cd-4d29-8e0d-921448c792f2@github.com> On Thu, 4 Dec 2025 19:14:43 GMT, Vladimir Ivanov wrote: > 2 slots is the most common case; any benefits from optimizing specifically for it (e.g., unroll the loops)? Yes, since `row_limit()` is statically know and does not change we can have two versions of code based on its value: - `<= 2` slots: fully unrolled (much less instructions) - `> 2` slots: current proposed code ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3614447709 From kvn at openjdk.org Thu Dec 4 21:56:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 4 Dec 2025 21:56:46 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 19:14:43 GMT, Vladimir Ivanov wrote: > fast path can be further optimized for no nulls case by offloading more work on found_null slow path [1] I don't think we need to optimize `> 2` slots case. Such setting is not current default. Also based on @shipilev comments 2 separate loops is more or less optimal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3614475586 From duke at openjdk.org Fri Dec 5 02:42:25 2025 From: duke at openjdk.org (duke) Date: Fri, 5 Dec 2025 02:42:25 GMT Subject: RFR: 8365732: RISC-V: implement AES CTR intrinsics [v30] In-Reply-To: References: Message-ID: On Wed, 3 Dec 2025 03:44:27 GMT, Anjian Wen wrote: >> Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. > > Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: > > - Merge branch 'openjdk:master' into aes_ctr > - modify label L_EXIT to L_exit > - add more comments for key value 52 > - update some comments, names and Pseudocode > - modify stub_id name > - Merge branch 'openjdk:master' into aes_ctr > - modify format > - add more comments > - modify parm to unsigned as aarch64 and x86 > - clean comments and format > - ... and 21 more: https://git.openjdk.org/jdk/compare/530493fe...98d802d5 @Anjian-Wen Your change (at version 98d802d5da10c1fd9397bb539d9bf80a9fabd8f9) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25281#issuecomment-3615078771 From wenanjian at openjdk.org Fri Dec 5 02:54:10 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Fri, 5 Dec 2025 02:54:10 GMT Subject: Integrated: 8365732: RISC-V: implement AES CTR intrinsics In-Reply-To: References: Message-ID: On Sat, 17 May 2025 03:13:46 GMT, Anjian Wen wrote: > Hi everyone, please help review this patch which Implement the _counterMode_AESCrypt with Zvkned. On my QEMU, with Zvkned extension enabled, the tests in test/hotspot/jtreg/compiler/codegen/aes/ Passed. This pull request has now been integrated. Changeset: 7e91d34f Author: Anjian Wen Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/7e91d34f3e83b4c39d6ce5de34373d7d74d54512 Stats: 239 lines in 2 files changed: 230 ins; 1 del; 8 mod 8365732: RISC-V: implement AES CTR intrinsics Reviewed-by: fyang, mli ------------- PR: https://git.openjdk.org/jdk/pull/25281 From wenanjian at openjdk.org Fri Dec 5 03:24:31 2025 From: wenanjian at openjdk.org (Anjian Wen) Date: Fri, 5 Dec 2025 03:24:31 GMT Subject: RFR: 8371968: RISC-V: implement AES CBC intrinsics [v3] In-Reply-To: References: Message-ID: > Support AES CBC intrinsic on RISCV, Already passed the tests in > test/hotspot/jtreg/compiler/codegen/aes/ > test/jdk/com/sun/crypto Anjian Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Modify some assert - RISC-V: implement AES CBC intrinsics ------------- Changes: https://git.openjdk.org/jdk/pull/28320/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28320&range=02 Stats: 228 lines in 1 file changed: 227 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28320/head:pull/28320 PR: https://git.openjdk.org/jdk/pull/28320 From jkarthikeyan at openjdk.org Fri Dec 5 06:04:03 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 5 Dec 2025 06:04:03 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v3] In-Reply-To: References: Message-ID: > Hi all, > This is a quick patch for the assert failure in superword truncation with CastII. I've added a check for all constraint cast nodes, and attached a reduced version of the fuzzer test. Thanks! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Remove CompLevel.C2 from test - Merge branch 'master' into jdk-8365570 - Update comment for constraint casts - Fix truncation assert for constraint casts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/26827/files - new: https://git.openjdk.org/jdk/pull/26827/files/d6c81a9d..f433930e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26827&range=01-02 Stats: 600645 lines in 6681 files changed: 411649 ins; 119944 del; 69052 mod Patch: https://git.openjdk.org/jdk/pull/26827.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26827/head:pull/26827 PR: https://git.openjdk.org/jdk/pull/26827 From jkarthikeyan at openjdk.org Fri Dec 5 06:04:04 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 5 Dec 2025 06:04:04 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: <1z430wmE_HRTJqmLIC15VMUktLyUEE7qjkppr1GniAI=.e560a4e9-59f0-4013-ad65-5d7261cdbf0e@github.com> References: <1z430wmE_HRTJqmLIC15VMUktLyUEE7qjkppr1GniAI=.e560a4e9-59f0-4013-ad65-5d7261cdbf0e@github.com> Message-ID: On Mon, 22 Sep 2025 06:57:24 GMT, Christian Hagedorn wrote: >> Thanks for the comment! I used `CompLevel.C2` here to simulate an -Xcomp environment, since unfortunately I couldn't replicate the crash without it with the IR framework. I'll do some investigation to find a way to ensure that it won't fail without C2. > > When you specify `@Warmup(0)`, the IR framework should directly compile it at the highest level which should be C2 if you are not running with a client build. So, I would have expected that it makes no difference. Can you double-check if you can reproduce it with `CompLevel.C2` but not without? After taking a closer look, I think you're correct- I can reproduce the crash using just `@Warmup(0)` and `@Test`. I think I used both while debugging and didn't test whether it worked without `CompLevel.C2`. I've removed it in the latest commit. However, I noticed that after that I merged from master neither the test nor the reproducer failed compilation before the fix is added. I think another commit must have changed the generated graph so that it no longer tries to vectorize the `CastII`, leading to the crash not being triggered. I looked at the JBS entry and saw that there wasn't another reproducer for this, so I was a bit unsure on what to do. Should this patch be merged with the current test? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2591510915 From kvn at openjdk.org Fri Dec 5 06:12:11 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Dec 2025 06:12:11 GMT Subject: RFR: 8357258: x86: Improve receiver type profiling reliability [v8] In-Reply-To: References: Message-ID: <1oFMd7KL-qhMxhHPi0mTspnW8oNMF8ZVGucT6IJXwv4=.d81f2eca-9ddc-4d63-8a20-40c2192e1004@github.com> On Tue, 2 Dec 2025 10:31:22 GMT, Aleksey Shipilev wrote: >> See the bug for discussion what issues current machinery has. >> >> This PR executes the plan outlined in the bug: >> 1. Common the receiver type profiling code in interpreter and C1 >> 2. Rewrite receiver type profiling code to only do atomic receiver slot installations >> 3. Trim `C1OptimizeVirtualCallProfiling` to only claim slots when receiver is installed >> >> This PR does _not_ do atomic counter updates themselves, as it may have much wider performance implications, including regressions. This PR should be at least performance neutral. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - More comments > - Tighten up the comments > - Simplify third case: no need to loop, just restart the search > - Actually have a second "fast" case: receiver is not found in the table, and the table is full > - Pushing/popping for rare CAS path is counter-productive > - Merge branch 'master' into JDK-8357258-x86-c1-optimize-virt-calls > - Tighten up some more > - Offset is always rscratch1, no need to save it > - Grossly simplify register shuffling > - ... and 11 more: https://git.openjdk.org/jdk/compare/7278d2e8...3c5019d9 My testing of version 07 passed clean ------------- PR Comment: https://git.openjdk.org/jdk/pull/25305#issuecomment-3615451277 From duke at openjdk.org Fri Dec 5 07:25:06 2025 From: duke at openjdk.org (duke) Date: Fri, 5 Dec 2025 07:25:06 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v17] In-Reply-To: References: Message-ID: On Wed, 29 Oct 2025 09:45:04 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: > > Make code more compact @MaxXSoft Your change (at version 092d968d2fb54aaa59f9a28b907be5e0ddf3606c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3615611622 From xgong at openjdk.org Fri Dec 5 07:50:07 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 5 Dec 2025 07:50:07 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 12:09:51 GMT, Emanuel Peter wrote: >> Is it better that we add an assertion of non `nullptr` value before returning in `gen_masked_vector` , consider this might make the caller code clean? > > If you do the assert inside the method, then later someone may just do `return nullptr` somewhere, and your assert won't catch it, right? > > I would just do this: > > if (Matcher::vector_needs_partial_operations(this, vt)) { > Node* n = VectorNode::gen_masked_vector(phase, this, vt); > if (n != nullptr) { return n; } > } > > Or you could even combine the methods `vector_needs_partial_operations` and `gen_masked_vector` into some `Ideal_partial_operations`: > > Node* progress = VectorNode::Ideal_partial_operations(phase, vt, this); > if (progress != nullptr) { return progress; } > > That would remove the most clutter from the caller method. Sounds good to me. I will change the code by combining the methods into a function. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28651#discussion_r2591722969 From erfang at openjdk.org Fri Dec 5 08:13:19 2025 From: erfang at openjdk.org (Eric Fang) Date: Fri, 5 Dec 2025 08:13:19 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v3] In-Reply-To: References: Message-ID: > `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. > > This PR does the following optimizations: > 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. > 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as th... Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Refine the test code and comments - Merge branch 'master' into JDK-8370863-mask-cast-opt - Don't read and write the same memory in the JMH benchmarks - Merge branch 'master' into JDK-8370863-mask-cast-opt - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. Current optimizations related to `VectorMaskCastNode` include: 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. This PR does the following optimizations: 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as the vector length remains the same, and this is guranteed in the api level. I conducted some simple research on different mask generation methods and mask operations, and obtained the following table, which includes some potential optimization opportunities that may use this `uncast_mask` function. ``` mask_gen\op toLong anyTrue allTrue trueCount firstTrue lastTrue compare N/A N/A N/A N/A N/A N/A maskAll TBI TBI TBI TBI TBI TBI fromLong TBI TBI N/A TBI TBI TBI mask_gen\op and or xor andNot not laneIsSet compare N/A N/A N/A N/A TBI N/A maskAll TBI TBI TBI TBI TBI TBI fromLong N/A N/A N/A N/A TBI TBI ``` `TBI` indicated that there may be potential optimizations here that require further investigation. Benchmarks: On a Nvidia Grace machine with 128-bit SVE2: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 59.23 0.21 148.12 0.07 2.50 microMaskLoadCastStoreDouble128 ops/us 2.43 0.00 38.31 0.01 15.73 microMaskLoadCastStoreFloat128 ops/us 6.19 0.00 75.67 0.11 12.22 microMaskLoadCastStoreInt128 ops/us 6.19 0.00 75.67 0.03 12.22 microMaskLoadCastStoreLong128 ops/us 2.43 0.00 38.32 0.01 15.74 microMaskLoadCastStoreShort64 ops/us 28.89 0.02 75.60 0.09 2.62 ``` On a Nvidia Grace machine with 128-bit NEON: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 75.75 0.19 149.74 0.08 1.98 microMaskLoadCastStoreDouble128 ops/us 8.71 0.03 38.71 0.05 4.44 microMaskLoadCastStoreFloat128 ops/us 24.05 0.03 76.49 0.05 3.18 microMaskLoadCastStoreInt128 ops/us 24.06 0.02 76.51 0.05 3.18 microMaskLoadCastStoreLong128 ops/us 8.72 0.01 38.71 0.02 4.44 microMaskLoadCastStoreShort64 ops/us 24.64 0.01 76.43 0.06 3.10 ``` On an AMD EPYC 9124 16-Core Processor with AVX3: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 82.13 0.31 115.14 0.08 1.40 microMaskLoadCastStoreDouble128 ops/us 0.32 0.00 0.32 0.00 1.01 microMaskLoadCastStoreFloat128 ops/us 42.18 0.05 57.56 0.07 1.36 microMaskLoadCastStoreInt128 ops/us 42.19 0.01 57.53 0.08 1.36 microMaskLoadCastStoreLong128 ops/us 0.30 0.01 0.32 0.00 1.05 microMaskLoadCastStoreShort64 ops/us 42.18 0.05 57.59 0.01 1.37 ``` On an AMD EPYC 9124 16-Core Processor with AVX2: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 73.53 0.20 114.98 0.03 1.56 microMaskLoadCastStoreDouble128 ops/us 0.29 0.01 0.30 0.00 1.00 microMaskLoadCastStoreFloat128 ops/us 30.78 0.14 57.50 0.01 1.87 microMaskLoadCastStoreInt128 ops/us 30.65 0.26 57.50 0.01 1.88 microMaskLoadCastStoreLong128 ops/us 0.30 0.00 0.30 0.00 0.99 microMaskLoadCastStoreShort64 ops/us 24.92 0.00 57.49 0.01 2.31 ``` On an AMD EPYC 9124 16-Core Processor with AVX1: ``` Benchmark Unit Before Error After Error Uplift microMaskLoadCastStoreByte64 ops/us 79.68 0.01 248.49 0.91 3.12 microMaskLoadCastStoreDouble128 ops/us 0.28 0.00 0.28 0.00 1.00 microMaskLoadCastStoreFloat128 ops/us 31.11 0.04 95.48 2.27 3.07 microMaskLoadCastStoreInt128 ops/us 31.10 0.03 99.94 1.87 3.21 microMaskLoadCastStoreLong128 ops/us 0.28 0.00 0.28 0.00 0.99 microMaskLoadCastStoreShort64 ops/us 31.11 0.02 94.97 2.30 3.05 ``` This PR was tested on 128-bit, 256-bit, and 512-bit (QEMU) aarch64 environments, and two 512-bit x64 machines with various configurations, including sve2, sve1, neon, avx3, avx2, avx1, sse4 and sse3, all tests passed. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28313/files - new: https://git.openjdk.org/jdk/pull/28313/files/3b0ff7d6..c04039ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=01-02 Stats: 64625 lines in 1066 files changed: 42561 ins; 15516 del; 6548 mod Patch: https://git.openjdk.org/jdk/pull/28313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28313/head:pull/28313 PR: https://git.openjdk.org/jdk/pull/28313 From erfang at openjdk.org Fri Dec 5 08:13:20 2025 From: erfang at openjdk.org (Eric Fang) Date: Fri, 5 Dec 2025 08:13:20 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v3] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 08:10:32 GMT, Eric Fang wrote: >> `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. >> >> This PR does the following optimizations: >> 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. >> 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vect... > > Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Refine the test code and comments > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - Don't read and write the same memory in the JMH benchmarks > - Merge branch 'master' into JDK-8370863-mask-cast-opt > - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns > > `VectorMaskCastNode` is used to cast a vector mask from one type to > another type. The cast may be generated by calling the vector API `cast` > or generated by the compiler. For example, some vector mask operations > like `trueCount` require the input mask to be integer types, so for > floating point type masks, the compiler will cast the mask to the > corresponding integer type mask automatically before doing the mask > operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` > don't generate code, otherwise code will be generated to extend or narrow > the mask. This IR node is not free no matter it generates code or not > because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` > The middle `VectorMaskCast` prevented the following optimization: > `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which > blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we > can safely do the optimization. But if the input value is changed, we > can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper > function, which can be used to uncast a chain of `VectorMaskCastNode`, > like the existing `Node::uncast(bool)` function. The funtion returns > the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may > contain one or more consecutive `VectorMaskCastNode` and this does not > affect the correctness of the optimization. Then this function can be > called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV... Thanks for your review! @galderz ------------- PR Review: https://git.openjdk.org/jdk/pull/28313#pullrequestreview-3537647873 From erfang at openjdk.org Fri Dec 5 08:13:24 2025 From: erfang at openjdk.org (Eric Fang) Date: Fri, 5 Dec 2025 08:13:24 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v2] In-Reply-To: References: <4vSKAtr0tUG0V193gIvnEFdHm18ZhqflVAwk-09IVQ0=.081806f5-6303-4b4f-975d-7c85427ccae5@github.com> Message-ID: On Fri, 28 Nov 2025 09:09:28 GMT, Galder Zamarre?o wrote: >> Eric Fang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Don't read and write the same memory in the JMH benchmarks >> - Merge branch 'master' into JDK-8370863-mask-cast-opt >> - 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns >> >> `VectorMaskCastNode` is used to cast a vector mask from one type to >> another type. The cast may be generated by calling the vector API `cast` >> or generated by the compiler. For example, some vector mask operations >> like `trueCount` require the input mask to be integer types, so for >> floating point type masks, the compiler will cast the mask to the >> corresponding integer type mask automatically before doing the mask >> operation. This kind of cast is very common. >> >> If the vector element size is not changed, the `VectorMaskCastNode` >> don't generate code, otherwise code will be generated to extend or narrow >> the mask. This IR node is not free no matter it generates code or not >> because it may block some optimizations. For example: >> 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` >> The middle `VectorMaskCast` prevented the following optimization: >> `(VectorStoremask (VectorLoadMask x)) => (x)` >> 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which >> blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. >> >> In these IR patterns, the value of the input `x` is not changed, so we >> can safely do the optimization. But if the input value is changed, we >> can't eliminate the cast. >> >> The general idea of this PR is introducing an `uncast_mask` helper >> function, which can be used to uncast a chain of `VectorMaskCastNode`, >> like the existing `Node::uncast(bool)` function. The funtion returns >> the first non `VectorMaskCastNode`. >> >> The intended use case is when the IR pattern to be optimized may >> contain one or more consecutive `VectorMaskCastNode` and this does not >> affect the correctness of the optimization. Then this function can be >> called to eliminate the `VectorMaskCastNode` chain. >> >> Current optimizations related to `VectorMaskCastNode` include: >> 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. >> 2. `(XorV (VectorMa... > > src/hotspot/share/opto/vectornode.cpp line 1056: > >> 1054: // x remains to be a bool vector with no changes. >> 1055: // This function can be used to eliminate the VectorMaskCast in such patterns. >> 1056: Node* VectorNode::uncast_mask(Node* n) { > > Could this be a static method instead? Yeah it's already a static method. See https://github.com/openjdk/jdk/pull/28313/files#diff-ba9e2d10a50a01316946660ec9f68321eb864fd9c815616c10abbec39360efe5R141 Or you mean a static method limited to this file ? If so, I prefer not, it may be used at other places. Thanks~ > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 57: > >> 55: applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}) >> 56: public static int testTwoCastToDifferentType() { >> 57: // The types before and after the two casts are not the same, so the cast cannot be eliminated. > > Outdated comment. Also please expand assertion comments Done, thanks! > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCastIdentityTest.java line 79: > >> 77: applyIfCPUFeatureAnd = {"avx2", "true", "avx512", "false"}) >> 78: public static int testTwoCastToDifferentType2() { >> 79: // The types before and after the two casts are not the same, so the cast cannot be eliminated. > > Could you expand the documentation on the IR assertions? It's not immediately clear why with AVX-512 the cast remains but with AVX-2 it's removed. Also, this comment is outdated. This is because the following optimization on AVX2 affects this optimization: `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => x` On AVX2 `trueCount()` requires converting the mask to a **boolean vector** first via `VectorStoreMask`. So `VectorStoreMask` can apply the above optimization, which eliminates all `VectorMaskCast `nodes as a side effect. On AVX-512, masks use dedicated mask registers (k registers), `VectorStoreMask` is not generated for `trueCount()`, so `VectorMaskCast` nodes remain. I reorganised this file, please take another look, thanks~ > test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java line 240: > >> 238: >> 239: @Test >> 240: @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0", > > Could you add some assertion comments here as well to understand what causes the differences with different architectures? Done > test/hotspot/jtreg/compiler/vectorapi/VectorMaskToLongTest.java line 260: > >> 258: >> 259: @Test >> 260: @IR(counts = { IRNode.VECTOR_LONG_TO_MASK, "= 0", > > Same here Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587209533 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587250313 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587250610 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587250972 PR Review Comment: https://git.openjdk.org/jdk/pull/28313#discussion_r2587251084 From epeter at openjdk.org Fri Dec 5 08:16:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Dec 2025 08:16:05 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v9] In-Reply-To: References: <9xCpJGY6CFKPAt4VtDY23_Tr3SE9tUebdMF3pAYWhFA=.281e0b84-bfad-466b-b290-918cf1fa83d1@github.com> Message-ID: On Wed, 29 Oct 2025 09:41:18 GMT, Qizheng Xing wrote: >> @MaxXSoft Feel free to just ping me again when you want another review :) >> FYI: I'll be on a longer vacation starting in about a week, so don't expect me to respond then. > > @eme64 Thank you for the review! > > @merykitty @jatin-bhateja Do you have any other suggestions regarding the latest changes in this patch? @MaxXSoft I think we should first merge and test this PR again. It is 4 weeks old now, so there is a risk that something would break if we integrated now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3615747403 From epeter at openjdk.org Fri Dec 5 08:25:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Dec 2025 08:25:57 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: References: <1z430wmE_HRTJqmLIC15VMUktLyUEE7qjkppr1GniAI=.e560a4e9-59f0-4013-ad65-5d7261cdbf0e@github.com> Message-ID: <8jnY6pqofieRIfV5fCqFxvHZMF3nAZbh7yAD7C_G5FU=.a12c98f6-c715-43f5-9528-62fcfdfc6e59@github.com> On Fri, 5 Dec 2025 06:00:22 GMT, Jasmine Karthikeyan wrote: >> When you specify `@Warmup(0)`, the IR framework should directly compile it at the highest level which should be C2 if you are not running with a client build. So, I would have expected that it makes no difference. Can you double-check if you can reproduce it with `CompLevel.C2` but not without? > > After taking a closer look, I think you're correct- I can reproduce the crash using just `@Warmup(0)` and `@Test`. I think I used both while debugging and didn't test whether it worked without `CompLevel.C2`. I've removed it in the latest commit. > However, I noticed that after that I merged from master neither the test nor the reproducer failed compilation before the fix is added. I think another commit must have changed the generated graph so that it no longer tries to vectorize the `CastII`, leading to the crash not being triggered. I looked at the JBS entry and saw that there wasn't another reproducer for this, so I was a bit unsure on what to do. Should this patch be merged with the current test? @jaskarth Thanks for looking into it! I would still add the fix, just in case. And I think the test as well, even if it does not reproduce any more. I was wondering: before the merge, when the test still reproduced: If you removed the `@Warmup(0)` and `CompLevel.C2`, and instead just do `framework.addFlags` with `-Xcomp`, would that reproduce too? If so, you could have a framework run with and one without Xcomp, the one with Xcomp also should have a compileonly. What do you think? Or we just push the patch as is, to be sure this is done and integrated. What do you think @chhagedorn ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2591811631 From qxing at openjdk.org Fri Dec 5 08:57:06 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 5 Dec 2025 08:57:06 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v18] In-Reply-To: References: Message-ID: > The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. > > This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: > > > public static int numberOfNibbles(int i) { > int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); > return Math.max((mag + 3) / 4, 1); > } > > > Testing: tier1, IR test Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: - Merge branch 'master' into enhance-clz-type - Make code more compact - Fix include order - Merge branch 'master' into enhance-clz-type - Merge branch 'master' into enhance-clz-type - Fix constant fold - Remove redundant import - Add random range tests - Add more comments to IR test - Add more constant folding tests for CLZ/CTZ - ... and 14 more: https://git.openjdk.org/jdk/compare/674cc3ee...f0687754 ------------- Changes: https://git.openjdk.org/jdk/pull/25928/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25928&range=17 Stats: 801 lines in 4 files changed: 735 ins; 54 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/25928.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25928/head:pull/25928 PR: https://git.openjdk.org/jdk/pull/25928 From qxing at openjdk.org Fri Dec 5 09:05:03 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Fri, 5 Dec 2025 09:05:03 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v9] In-Reply-To: References: <9xCpJGY6CFKPAt4VtDY23_Tr3SE9tUebdMF3pAYWhFA=.281e0b84-bfad-466b-b290-918cf1fa83d1@github.com> Message-ID: On Fri, 5 Dec 2025 08:13:17 GMT, Emanuel Peter wrote: >> @eme64 Thank you for the review! >> >> @merykitty @jatin-bhateja Do you have any other suggestions regarding the latest changes in this patch? > > @MaxXSoft I think we should first merge and test this PR again. It is 4 weeks old now, so there is a risk that something would break if we integrated now. @eme64 Sorry for the delay in integrating this PR. Since it modifies HotSpot, I wasn't quite sure whether two or more reviewer approvals were required for the latest commit before integration, so I've been waiting for reviews from other reviewers. I've now merged the latest master branch, which may require you to run some Oracle tests again. Thanks. BTW, I'd like to confirm: if you approve the current post-merge changes, do I still need to wait for other reviewers? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3615898684 From epeter at openjdk.org Fri Dec 5 09:13:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Dec 2025 09:13:03 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise In-Reply-To: <1yjEG7xjZcmvAECD2ovS0pW8IwA30p9BzCr0Krgy4ks=.3224b13e-a32f-468d-a6e3-3fb5a1c35c04@github.com> References: <1yjEG7xjZcmvAECD2ovS0pW8IwA30p9BzCr0Krgy4ks=.3224b13e-a32f-468d-a6e3-3fb5a1c35c04@github.com> Message-ID: On Mon, 23 Jun 2025 07:41:01 GMT, Quan Anh Mai wrote: >>> A stricter bound would be `TypeInt::make(~t._bits._zeros, t._bits._ones, t._widen)` >> >> @merykitty Thanks for your review, did you mean `TypeInt::make(clz(~t._bits._zeros), clz(t._bits._ones), t._widen)`? > > @MaxXSoft Yes you are right, my mistake You do need 2 reviewers. I see that @merykitty has reviewed this a while ago, but a re-approval from him would be good since it is so long ago now. I'll run some testing now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3615927849 From erfang at openjdk.org Fri Dec 5 09:14:41 2025 From: erfang at openjdk.org (Eric Fang) Date: Fri, 5 Dec 2025 09:14:41 GMT Subject: RFR: 8370863: VectorAPI: Optimize the VectorMaskCast chain in specific patterns [v4] In-Reply-To: References: Message-ID: > `VectorMaskCastNode` is used to cast a vector mask from one type to another type. The cast may be generated by calling the vector API `cast` or generated by the compiler. For example, some vector mask operations like `trueCount` require the input mask to be integer types, so for floating point type masks, the compiler will cast the mask to the corresponding integer type mask automatically before doing the mask operation. This kind of cast is very common. > > If the vector element size is not changed, the `VectorMaskCastNode` don't generate code, otherwise code will be generated to extend or narrow the mask. This IR node is not free no matter it generates code or not because it may block some optimizations. For example: > 1. `(VectorStoremask (VectorMaskCast (VectorLoadMask x)))` The middle `VectorMaskCast` prevented the following optimization: `(VectorStoremask (VectorLoadMask x)) => (x)` > 2. `(VectorMaskToLong (VectorMaskCast (VectorLongToMask x)))`, which blocks the optimization `(VectorMaskToLong (VectorLongToMask x)) => (x)`. > > In these IR patterns, the value of the input `x` is not changed, so we can safely do the optimization. But if the input value is changed, we can't eliminate the cast. > > The general idea of this PR is introducing an `uncast_mask` helper function, which can be used to uncast a chain of `VectorMaskCastNode`, like the existing `Node::uncast(bool)` function. The funtion returns the first non `VectorMaskCastNode`. > > The intended use case is when the IR pattern to be optimized may contain one or more consecutive `VectorMaskCastNode` and this does not affect the correctness of the optimization. Then this function can be called to eliminate the `VectorMaskCastNode` chain. > > Current optimizations related to `VectorMaskCastNode` include: > 1. `(VectorMaskCast (VectorMaskCast x)) => (x)`, see JDK-8356760. > 2. `(XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))`, see JDK-8354242. > > This PR does the following optimizations: > 1. Extends the optimization pattern `(VectorMaskCast (VectorMaskCast x)) => (x)` as `(VectorMaskCast (VectorMaskCast? ... (VectorMaskCast x))) => (x)`. Because as long as types of the head and tail `VectorMaskCastNode` are consistent, the optimization is correct. > 2. Supports a new optimization pattern `(VectorStoreMask (VectorMaskCast ... (VectorLoadMask x))) => (x)`. Since the value before and after the pattern is a boolean vector, it remains unchanged as long as th... Eric Fang has updated the pull request incrementally with one additional commit since the last revision: Add MaxVectorSize IR test condition for VectorStoreMaskIdentityTest.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28313/files - new: https://git.openjdk.org/jdk/pull/28313/files/c04039ce..aa9a08a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28313&range=02-03 Stats: 18 lines in 1 file changed: 6 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/28313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28313/head:pull/28313 PR: https://git.openjdk.org/jdk/pull/28313 From xgong at openjdk.org Fri Dec 5 09:37:22 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 5 Dec 2025 09:37:22 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE [v2] In-Reply-To: References: Message-ID: > **Problem:** > > This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: > > > Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { > const TypeVect* vt = vect_type(); > if (Matcher::vector_needs_partial_operations(this, vt)) { > return VectorNode::try_to_gen_masked_vector(phase, this, vt); > } > return LoadNode::Ideal(phase, can_reshape); > } > > > The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. > > This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Solution:** > > This patch addresses the issue through two changes: > > 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. > 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. > > **Testing:** > > - Verified on different SVE platforms with different vector sizes (128|256|512 bits). > - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). > - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Combine the condition check and IR transformation to a method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28651/files - new: https://git.openjdk.org/jdk/pull/28651/files/ba7592cb..6206e8c0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28651&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28651&range=00-01 Stats: 36 lines in 2 files changed: 9 ins; 8 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/28651.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28651/head:pull/28651 PR: https://git.openjdk.org/jdk/pull/28651 From xgong at openjdk.org Fri Dec 5 09:44:56 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 5 Dec 2025 09:44:56 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE [v2] In-Reply-To: References: Message-ID: On Thu, 4 Dec 2025 08:23:31 GMT, Emanuel Peter wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Combine the condition check and IR transformation to a method > > @XiaohongGong Thanks for taking this over from me and fixing this so quickly, much appreciated! > > Thanks for adding my regression tests and for the attribution :) > > I only have a minor comment below. Otherwise, the code looks good to me. But since I'm not an SVE specialist, it would be good if someone with deeper knowledge would do a deeper review of the specific SVE parts. > > Once an SVE specialist gives the approval for the PR, I'l run some internal testing and approve from my side :) > > Ah, and one more thing: you should change the PR title to be more descriptive of the issue. The assert that was hit is only a far removed symptom. I would suggest: > > `C2 SVE: missing Ideal optimizations for load and store vectors` Hi @eme64, I'v updated the patch based on your suggestion. Would you mind taking another look? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28651#issuecomment-3616043996 From xgong at openjdk.org Fri Dec 5 09:44:58 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 5 Dec 2025 09:44:58 GMT Subject: RFR: 8371603: C2: Missing Ideal optimizations for load and store vectors on SVE [v2] In-Reply-To: References: Message-ID: <6ZWIQIawQ21qf2vHUGMeT2WENuzq5S0zbR0Mj3UehCU=.8a458bfa-6388-4690-aa97-d18e3a9303a5@github.com> On Fri, 5 Dec 2025 09:37:22 GMT, Xiaohong Gong wrote: >> **Problem:** >> >> This issue occurs on a 256-bit SVE machine, caused by the following problematic pattern in `LoadVectorNode::Ideal()`: >> >> >> Node* LoadVectorNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> const TypeVect* vt = vect_type(); >> if (Matcher::vector_needs_partial_operations(this, vt)) { >> return VectorNode::try_to_gen_masked_vector(phase, this, vt); >> } >> return LoadNode::Ideal(phase, can_reshape); >> } >> >> >> The condition `Matcher::vector_needs_partial_operations(this, vt)` returns true for `LoadVectorNode` with 256-bit vector size even when the vector size equals the maximum vector size on SVE. In such cases, when `VectorNode::try_to_gen_masked_vector()` returns `nullptr`, the method exits early without calling `LoadNode::Ideal()`. This results in missing crucial optimizations that would normally be applied by the superclass. >> >> This code was introduced by https://bugs.openjdk.org/browse/JDK-8286941 to generate vector masks for partial vector operations, but it failed to ensure that the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Solution:** >> >> This patch addresses the issue through two changes: >> >> 1. Refine `Matcher::vector_needs_partial_operations()` to return true only when the vector node genuinely represents a partial vector operation that requires masking. >> 2. Modify `VectorNode::try_to_gen_masked_vector()` to never return `nullptr`, ensuring the superclass `Ideal()` method is always invoked when no transformation is applied. >> >> **Testing:** >> >> - Verified on different SVE platforms with different vector sizes (128|256|512 bits). >> - Verified on X86 platforms with different avx options (-XX:UseAVX=1|2|3). >> - Added two new IR tests to verify 1) previously missing optimizations for `LoadVector/StoreVector` are now applied, and 2) that mask and the correct IR patterns are generated for partial vector operations. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Combine the condition check and IR transformation to a method Hi @theRealAph , would you mind taking a look at this patch especially the AArch64 part? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/28651#issuecomment-3616050133 From aseoane at openjdk.org Fri Dec 5 10:08:33 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Fri, 5 Dec 2025 10:08:33 GMT Subject: RFR: 8280283: Dead compiler code found during the JDK-8272058 code review [v2] In-Reply-To: References: Message-ID: > This PR removes some dead code that was found during review for [JDK-8272058](https://bugs.openjdk.org/browse/JDK-8272058). > > `target_addr_for_insn_or_null` is never run with a `ldrw` to `zr` (i.e. a safepoint poll). This is just a remnant from global safepointing, before we moved to using thread-local handshakes. No safepoint polling code reaches this function. More information can be read in the [original code review](https://github.com/openjdk/jdk18/pull/51#discussion_r774922087). Additionally, I have run tiers 1-6 to make sure this path did not exercise. > > This changeset also cleans up the unused `is_nop` function, following the comments in the issue. Other dead code mentioned there has since been long disappered. > > **Testing:** passes tiers 1-4 Anton Seoane Ampudia has updated the pull request incrementally with one additional commit since the last revision: Delete more unused code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28473/files - new: https://git.openjdk.org/jdk/pull/28473/files/749eda78..696cdd01 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28473&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28473&range=00-01 Stats: 10 lines in 2 files changed: 0 ins; 7 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/28473.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28473/head:pull/28473 PR: https://git.openjdk.org/jdk/pull/28473 From aseoane at openjdk.org Fri Dec 5 10:08:34 2025 From: aseoane at openjdk.org (Anton Seoane Ampudia) Date: Fri, 5 Dec 2025 10:08:34 GMT Subject: RFR: 8280283: Dead compiler code found during the JDK-8272058 code review In-Reply-To: <2DyhWZxKPAXQbCsjHhoSUQZ80Em0931LE2LRjLNRdHA=.cc61d9bd-fc90-40ea-88e9-ac76c21b5756@github.com> References: <2DyhWZxKPAXQbCsjHhoSUQZ80Em0931LE2LRjLNRdHA=.cc61d9bd-fc90-40ea-88e9-ac76c21b5756@github.com> Message-ID: <6W3ic0sK8eq9-M4mqtG7IWMV4aMiVBn3bHcPFBstgko=.2eb68b79-c03d-4779-b9a9-93f9c5ddfd21@github.com> On Tue, 2 Dec 2025 14:59:50 GMT, Boris Ulasevich wrote: >> This PR removes some dead code that was found during review for [JDK-8272058](https://bugs.openjdk.org/browse/JDK-8272058). >> >> `target_addr_for_insn_or_null` is never run with a `ldrw` to `zr` (i.e. a safepoint poll). This is just a remnant from global safepointing, before we moved to using thread-local handshakes. No safepoint polling code reaches this function. More information can be read in the [original code review](https://github.com/openjdk/jdk18/pull/51#discussion_r774922087). Additionally, I have run tiers 1-6 to make sure this path did not exercise. >> >> This changeset also cleans up the unused `is_nop` function, following the comments in the issue. Other dead code mentioned there has since been long disappered. >> >> **Testing:** passes tiers 1-4 > > Nice cleanup. Cleaning up dead code always helps reduce technical debt. > Are you sure there isn?t more to clean up? Have you tried building with GCC?s -Wunused options to catch additional unused symbols? Thanks @bulasevich! I built the relevant files with `-Wunused` and a few more dead lines came up. I've addressed them with my last commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28473#issuecomment-3616158434 From bkilambi at openjdk.org Fri Dec 5 10:55:01 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 5 Dec 2025 10:55:01 GMT Subject: RFR: 8370691: Add new Float16Vector type and enable intrinsification of vector operations supported by auto-vectorizer [v5] In-Reply-To: References: Message-ID: On Wed, 26 Nov 2025 11:34:11 GMT, Jatin Bhateja wrote: >> Add a new Float16lVector type and corresponding concrete vector classes, in addition to existing primitive vector types, maintaining operation parity with the FloatVector type. >> - Add necessary inline expander support. >> - Enable intrinsification for a few vector operations, namely ADD/SUB/MUL/DIV/MAX/MIN/FMA. >> - Use existing Float16 vector IR and backend support. >> - Extended the existing VectorAPI JTREG test suite for the newly added Float16Vector operations. >> >> The idea here is to first be at par with Float16 auto-vectorization support before intrinsifying new operations (conversions, reduction, etc). >> >> The following are the performance numbers for some of the selected Float16Vector benchmarking kernels compared to equivalent auto-vectorized Float16OperationsBenchmark kernels. >> >> image >> >> Initial RFP[1] was floated on the panama-dev mailing list. >> >> Kindly review the draft PR and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2025-August/021100.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Cleanups src/hotspot/share/opto/vectorIntrinsics.cpp line 341: > 339: laneType == nullptr || !laneType->is_con() || > 340: vector_klass == nullptr || vector_klass->const_oop() == nullptr || > 341: laneType == nullptr || !laneType->is_con() || is this repeating the same condition on line 339? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28002#discussion_r2592254926 From jvernee at openjdk.org Fri Dec 5 11:10:27 2025 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 5 Dec 2025 11:10:27 GMT Subject: RFR: 8160821: VarHandle accesses are penalized when argument conversion is required [v7] In-Reply-To: <7ayMTZ4nXMyB1SXNRcYGjdxidNHDcAUNv_8fQZDUaPI=.a558d3a2-1d3e-4b45-8ba7-393c55a52785@github.com> References: <7ayMTZ4nXMyB1SXNRcYGjdxidNHDcAUNv_8fQZDUaPI=.a558d3a2-1d3e-4b45-8ba7-393c55a52785@github.com> Message-ID: On Thu, 4 Dec 2025 01:48:31 GMT, Chen Liang wrote: >> Since access descriptor is created for each VH operation site, we can optimistically cache the adapted method handle in a site if the site operates on a constant VH. Used a C2 IR test to verify such a setup through an inexact VarHandle invocation can be constant folded through (previously, it was blocked by `asType`) > > Chen Liang has updated the pull request incrementally with one additional commit since the last revision: > > Revert void special case removal due to C2 shortage causing TestZGCBarrierElision::testAtomicThenAtomicAnotherField failure Latest version looks much better to me (as mentioned offline). What was the issue with the failing test around the removal of the _V guard template? Also, looks like the new IR test is failing in GHA test/hotspot/jtreg/compiler/c2/irTests/constantFold/VarHandleMismatchedTypeFold.java line 48: > 46: public static void main(String[] args) { > 47: TestFramework.runWithFlags( > 48: "-XX:+UnlockExperimentalVMOptions" Why is this flag needed? ------------- PR Review: https://git.openjdk.org/jdk/pull/28585#pullrequestreview-3544230655 PR Review Comment: https://git.openjdk.org/jdk/pull/28585#discussion_r2592287594 From roland at openjdk.org Fri Dec 5 11:55:34 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 11:55:34 GMT Subject: RFR: 8370200: Crash: assert(outer->outcnt() >= phis + 2 - be_loads && outer->outcnt() <= phis + 2 + stores + 1) failed: only phis Message-ID: The crash occurs because verification code expects the inner and outer loop of a loop strip mining nest to have the same number of phis but, in this case, the inner loop has one more memory phis than the outer loop. 1) After `OuterStripMinedLoopNode::adjust_strip_mined_loop`, inner and outer loops have the same number of phis, as expected. 309 MergeMem === _ 1 306 1 1 284 [[ 429 ]] { - - N284:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=205 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 ]] 429 Phi === 248 309 205 [[ 93 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 93 Phi === 249 429 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 2) Then `PhiNode::Ideal` runs for 429 and pushed the `MergeMem` 309 through the outer loop phi: 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 429 430 444 446 ]] 430 Phi === 248 306 121 [[ 94 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !orig=94 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 444 Phi === 248 306 121 [[ 445 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,429,93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 93 Phi === 249 445 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 94 Phi === 249 430 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 3) `PhiNode::Identity` runs for 430 and finds that it can be replace by 429: the non bottom memory phi 430 can be replaced by the bottom memory 429 that has the same inputs. 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 446 444 ]] 446 Phi === 248 284 170 [[ 445 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,[429],93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 444 Phi === 248 306 121 [[ 445 94 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 445 MergeMem === _ 1 444 1 1 446 [[ 93 ]] { - - N446:instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow] } Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],93 !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 249 CountedLoop === 249 248 197 [[ 249 119 96 93 94 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 93 Phi === 249 445 205 [[ 117 97 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 94 Phi === 249 444 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 4) `PhiNode::Ideal` runs for 93 and pushed the `MergeMem` through that `Phi`: 248 OuterStripMinedLoop === 248 321 247 [[ 248 249 428 446 444 ]] 446 Phi === 248 284 170 [[ 453 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=444,[429],[93] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 444 Phi === 248 306 121 [[ 451 94 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[429],[93] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 249 CountedLoop === 249 248 197 [[ 249 119 96 453 94 451 ]] inner stride: 1 strip mined !orig=[223],[91] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 453 Phi === 249 446 170 [[ 452 ]] #memory Memory: @instptr:java/lang/Throwable (java/io/Serializable):BotPTR+20,iid=bot [narrow], name=detailMessage, idx=5; !orig=451,[93] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 94 Phi === 249 444 121 [[ 97 ]] #memory Memory: @instptr:TestMismatchedMemoryPhis:BotPTR+16,iid=bot, name=l, idx=4; !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) 451 Phi === 249 444 121 [[ 452 ]] #memory Memory: @ptr:BotPTR+bot, idx=Bot; !orig=[93] !jvms: TestMismatchedMemoryPhis::mainTest @ bci:37 (line 49) Now, `PhiNode::Identity` for 94 could replace it with the bottom memory phi with same inputs 451. But it doesn't run. It last ran between 3) and 4) and there's no reason for igvn to execute it again because 4) doesn't cause 94 to change in any way. The fix I propose is to mirror the transformation from `PhiNode::Identity` in `PhiNode::Ideal` so the end result doesn't depend on what phi is modified and processed by igvn last. ------------- Commit messages: - more - test - more - fix Changes: https://git.openjdk.org/jdk/pull/28677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28677&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8370200 Stats: 124 lines in 5 files changed: 102 ins; 16 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/28677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28677/head:pull/28677 PR: https://git.openjdk.org/jdk/pull/28677 From galder at openjdk.org Fri Dec 5 13:38:24 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 5 Dec 2025 13:38:24 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v10] In-Reply-To: <1C0ByMoDpDlOmbDQVgBTQg7yKI0UaLtX92Xmf0bta4E=.0c060c5a-d60d-4cbe-84c5-03884116ef34@github.com> References: <1C0ByMoDpDlOmbDQVgBTQg7yKI0UaLtX92Xmf0bta4E=.0c060c5a-d60d-4cbe-84c5-03884116ef34@github.com> Message-ID: On Thu, 12 Jun 2025 09:12:03 GMT, Emanuel Peter wrote: >> **Past Work** >> With https://github.com/openjdk/jdk/pull/11775 / [JDK-8298952](https://bugs.openjdk.org/browse/JDK-8298952) we added `Node::Value` verification. >> >> **This PR** >> I'm now adding verification for `Ideal` and `Identity`. I'm adding two bits to the flag `VerifyIterativeGVN`. >> >> I found many many node types that hit my verification assert, i.e. that could still be optimized after IGVN is over, just because these nodes were not put on the worklist any more. >> >> My approach was to aggressively bail-out for all nodes that had an issue. This way, we can address one by one in follow-up RFEs. For many, I did some initial assessment, and left some comments about what issues I encountered. >> >> **Future Work:** >> In many cases, the issue is just a missing notification when inputs of inputs are changed. These would be good starter tasks. But there are probably also more complicated cases. And there are surely cases where verification will be impossible, because it is possible that the Idea / Identity optimizations traverse longer paths, and we cannot expect that notification makes it down that path. For those cases, we will have to leave the exception and document it well. >> >> I filed: >> [JDK-8359103](https://bugs.openjdk.org/browse/JDK-8359103) C2 VerifyIterativeGVN: Umbrella for extending Ideal and Identity verification (JDK-8347273) >> (We can file subtasks for the nodes we want to fix. I don't want to file them all now, but we should file them as we are investigating, so that there is no duplicate work.) >> >> Testing passed tier1-3, with extra timeout factor 20. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 79 additional commits since the last revision: > > - Merge branch 'master' into JDK-8347273-verify-IGVN-Ideal-Identity > - update comments for Christian > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - reorder flags for Christian > - max_modes > - use stringStream instead of ttyLocker > - assert(false) for Christian > - rename for Christian > - Update src/hotspot/share/opto/phaseX.cpp > > Co-authored-by: Manuel H?ssig > - review suggestions, and handled a few more edge cases > - ... and 69 more: https://git.openjdk.org/jdk/compare/44e9f72c...d9546d87 src/hotspot/share/opto/phaseX.cpp line 1966: > 1964: // > 1965: // Found with: > 1966: // compiler/codegen/TestBooleanVect.java @eme64 Did you really encounter issues for this min/max codes with `TestBooleanVect`? Or is test name incorrect here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2592718199 From roland at openjdk.org Fri Dec 5 13:48:50 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 13:48:50 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v5] In-Reply-To: References: Message-ID: > The test case has an out of loop `Store` with an `AddP` address > expression that has other uses and is in the loop body. Schematically, > only showing the address subgraph and the bases for the `AddP`s: > > > Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 > -> CastPP#110 > > > Both `AddP`s have the same base, a `CastPP` that's also in the loop > body. > > That loop is a counted loop and only has 3 iterations so is fully > unrolled. First, one iteration is peeled: > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > The `AddP`s and `CastPP` are cloned (because in the loop body). As > part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is > called. It finds the test that guards `CastPP#283` in the peeled > iteration dominates and replaces the test that guards `CastPP#110` > (the test in the peeled iteration is the clone of the test in the > loop). That causes `CastPP#110`'s control to be updated to that of the > test in the peeled iteration and to be yanked from the loop. So now > `CastPP#283` and `CastPP#110` have the same inputs. > > Next unrolling happens: > > > /-> CastPP#110 > /-> AddP#400 -> AddP#401 -> CastPP#110 > Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 > \ -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > `AddP`s are cloned once more but not the `CastPP`s because they are > both in the peeled iteration now. A new `Phi` is added. > > Next igvn runs. It's going to push the `AddP`s through the `Phi`s. > > Through `Phi#477`: > > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 > \ -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > Through `Phi#360`: > > > /-> AddP#134 -> CastPP#110 > /-> Phi#509 -> AddP#401 -> CastPP#110 > Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 > -> Phi#514 -> CastPP#283 > -> CastP#110 > > > Then `Phi#514` which has 2 `CastPP`s as input with identical inputs is > transformed into anot... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25386/files - new: https://git.openjdk.org/jdk/pull/25386/files/15c17bb1..20154a12 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=03-04 Stats: 34 lines in 3 files changed: 17 ins; 6 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/25386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25386/head:pull/25386 PR: https://git.openjdk.org/jdk/pull/25386 From roland at openjdk.org Fri Dec 5 13:48:51 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 13:48:51 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v4] In-Reply-To: <9mnRXpB16Y6Mw0TSGFJz-69m24lzCNPMC_B1_YseD4M=.be94bbba-88ce-4958-a8bd-89862d7ec2e7@github.com> References: <9mnRXpB16Y6Mw0TSGFJz-69m24lzCNPMC_B1_YseD4M=.be94bbba-88ce-4958-a8bd-89862d7ec2e7@github.com> Message-ID: On Wed, 3 Dec 2025 05:43:30 GMT, Emanuel Peter wrote: > I think I'm on board with the solution now. It is probably best to do it during IGVN. I have a few more suggestions below :) Updated change should address your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3616982320 From roland at openjdk.org Fri Dec 5 13:52:12 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 13:52:12 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v9] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/castnode.hpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24575/files - new: https://git.openjdk.org/jdk/pull/24575/files/93b8b0c5..cab44429 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=07-08 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From galder at openjdk.org Fri Dec 5 14:02:06 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 5 Dec 2025 14:02:06 GMT Subject: RFR: 8347273: C2: VerifyIterativeGVN for Ideal and Identity [v10] In-Reply-To: References: <1C0ByMoDpDlOmbDQVgBTQg7yKI0UaLtX92Xmf0bta4E=.0c060c5a-d60d-4cbe-84c5-03884116ef34@github.com> Message-ID: On Fri, 5 Dec 2025 13:35:22 GMT, Galder Zamarre?o wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 79 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8347273-verify-IGVN-Ideal-Identity >> - update comments for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - reorder flags for Christian >> - max_modes >> - use stringStream instead of ttyLocker >> - assert(false) for Christian >> - rename for Christian >> - Update src/hotspot/share/opto/phaseX.cpp >> >> Co-authored-by: Manuel H?ssig >> - review suggestions, and handled a few more edge cases >> - ... and 69 more: https://git.openjdk.org/jdk/compare/968ce906...d9546d87 > > src/hotspot/share/opto/phaseX.cpp line 1966: > >> 1964: // >> 1965: // Found with: >> 1966: // compiler/codegen/TestBooleanVect.java > > @eme64 Did you really encounter issues for this min/max codes with `TestBooleanVect`? Or is test name incorrect here? Seems correct. I removed all the cases and indeed `TestBooleanVect` fails. All good :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22970#discussion_r2592791745 From roland at openjdk.org Fri Dec 5 14:05:06 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:05:06 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v10] In-Reply-To: References: Message-ID: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24575/files - new: https://git.openjdk.org/jdk/pull/24575/files/cab44429..4a877c43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24575&range=08-09 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24575.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24575/head:pull/24575 PR: https://git.openjdk.org/jdk/pull/24575 From roland at openjdk.org Fri Dec 5 14:05:09 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:05:09 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: Message-ID: <5DHx3WmMb1UtSeyiEiYCiisVgRFggPFfxBggpgtuD6M=.d72a9c07-9624-47ea-9398-a0d1dee69755@github.com> On Tue, 2 Dec 2025 17:32:09 GMT, Quan Anh Mai wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'master' into JDK-8354282 >> - whitespace >> - review >> - review >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update test/hotspot/jtreg/compiler/c2/irTests/TestPushAddThruCast.java >> >> Co-authored-by: Christian Hagedorn >> - review >> - review >> - ... and 7 more: https://git.openjdk.org/jdk/compare/ef5e744a...93b8b0c5 > > src/hotspot/share/opto/castnode.hpp line 105: > >> 103: // All the possible combinations of floating/narrowing with example use cases: >> 104: >> 105: // Use case example: Range Check CastII > > I believe this is incorrect, a range check should be floating non-narrowing. It is only narrowing if the length of the array is a constant. It is because this cast encodes the dependency on the condition `index u< length`. This condition cannot be expressed in terms of `Type` unless `length` is a constant. Range check `CastII` were added to protect the `ConvI2L` in the address expression on 64 bits. The problem there was, in some cases, that the `ConvI2L` would float above the range check (because `ConvI2L` has no control input) and could end up with an out of range input (which in turn would cause the `ConvI2L` to become `top` in places where it wasn't expected). So `CastII` doesn't carry the control dependency of an array access on its range check. That dependency is carried by the `MemNode` which has its control input set to the range check. What you're saying, if I understand it correctly, would be true if the `CastII` was required to prevent an array `Load` from floating. But that's not the case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2592801401 From roland at openjdk.org Fri Dec 5 14:05:10 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:05:10 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs [v8] In-Reply-To: References: <0An6wz0QZZxtVg-lP4IyqWTekcYkSmvosrVWkI7cH70=.86c07374-2127-4892-a369-ceefa82dd0b7@github.com> <_rBmTvf064PXyVEAX4zqk43DNgVr0gQDPzPcdQ4XI1A=.660e7e89-0a49-47e0-9639-972cbfbac5f0@github.com> <4qc5jJ1KA09yko5rWioBGstpuuRNxOiNWXRdRdh9h_E=.17c8ace8-c672-4451-bd15-247d66d92cef@github.com> Message-ID: On Tue, 2 Dec 2025 17:41:37 GMT, Quan Anh Mai wrote: >> Ok, I now read the PR from the top, and not just recent changes. If one were to start reading from the top, it would be clear without my suggestions here. But I think it could still be good to apply something about letting the Cast float to where we would hoist the RC. > > Naming is hard, but it is worth pointing out in the comment that floating here refers to `depends_only_on_test`. In other words, a cast is considered floating if it is legal to change the control input of a cast from an `IfTrue` or `IfFalse` to an `IfTrue` and `IfFalse` that dominates the current control input, and the corresponding conditions of the `If`s are the same. In contrast, we cannot do that for a pinned cast, and if the control is folded away, the control input of the pinned cast is changed to the control predecessor of the folded node. > > It is also worth noting that we have `Node::pinned` which means the node is pinned AT the control input while pinned here means that it is pinned UNDER the control input. Very confusing! I added a mention of `depends_only_on_test`. Is that good enough? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24575#discussion_r2592784214 From epeter at openjdk.org Fri Dec 5 14:28:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Dec 2025 14:28:38 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v5] In-Reply-To: References: Message-ID: <2o1eUSi-ngISF33nhD0ie40H3PUeyT0rrV1DYjd7Ud4=.b05b3657-c32d-477a-8834-8747a8e98ed0@github.com> On Fri, 5 Dec 2025 13:48:50 GMT, Roland Westrelin wrote: >> The test case has an out of loop `Store` with an `AddP` address >> expression that has other uses and is in the loop body. Schematically, >> only showing the address subgraph and the bases for the `AddP`s: >> >> >> Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> CastPP#110 >> >> >> Both `AddP`s have the same base, a `CastPP` that's also in the loop >> body. >> >> That loop is a counted loop and only has 3 iterations so is fully >> unrolled. First, one iteration is peeled: >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> The `AddP`s and `CastPP` are cloned (because in the loop body). As >> part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is >> called. It finds the test that guards `CastPP#283` in the peeled >> iteration dominates and replaces the test that guards `CastPP#110` >> (the test in the peeled iteration is the clone of the test in the >> loop). That causes `CastPP#110`'s control to be updated to that of the >> test in the peeled iteration and to be yanked from the loop. So now >> `CastPP#283` and `CastPP#110` have the same inputs. >> >> Next unrolling happens: >> >> >> /-> CastPP#110 >> /-> AddP#400 -> AddP#401 -> CastPP#110 >> Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 >> \ -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> `AddP`s are cloned once more but not the `CastPP`s because they are >> both in the peeled iteration now. A new `Phi` is added. >> >> Next igvn runs. It's going to push the `AddP`s through the `Phi`s. >> >> Through `Phi#477`: >> >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 >> \ -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> Through `Phi#360`: >> >> >> /-> AddP#134 -> CastPP#110 >> /-> Phi#509 -> AddP#401 -> CastPP#110 >> Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 >> -> Phi#514 -> CastPP#283 >> ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Excellent, this looks great! Thanks for the updates @rwestrel ! I did not run testing again now. I think we can do that when a second reviewer gives the approval :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25386#pullrequestreview-3544999922 From epeter at openjdk.org Fri Dec 5 14:28:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Dec 2025 14:28:44 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) [v2] In-Reply-To: References: Message-ID: On Thu, 2 Oct 2025 04:03:58 GMT, Dean Long wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - test seed >> - more >> - Merge branch 'master' into JDK-8351889 >> - Merge branch 'master' into JDK-8351889 >> - more >> - test >> - fix > > What if we just relax the assert? I failed to figure out what this assert is protecting us from by looking at the code. So what happens in a product build or when this assert is commented out? @dean-long @galderz Do you want to do the second review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25386#issuecomment-3617140509 From epeter at openjdk.org Fri Dec 5 14:32:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Dec 2025 14:32:43 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v18] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 08:57:06 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - Merge branch 'master' into enhance-clz-type > - Make code more compact > - Fix include order > - Merge branch 'master' into enhance-clz-type > - Merge branch 'master' into enhance-clz-type > - Fix constant fold > - Remove redundant import > - Add random range tests > - Add more comments to IR test > - Add more constant folding tests for CLZ/CTZ > - ... and 14 more: https://git.openjdk.org/jdk/compare/674cc3ee...f0687754 Testing passed. @merykitty @jatin-bhateja Do either of you want to give a second approval? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25928#pullrequestreview-3545020267 From roland at openjdk.org Fri Dec 5 14:52:51 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Dec 2025 14:52:51 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v13] In-Reply-To: References: <2oDqUvcW_3hJRPRri4uttpkgfeCovL4ZZkcI0R1bB1A=.173b3a58-d0f1-4b29-94d1-77b0a350c790@github.com> <2wAnS7drj_r3dqsy5CEF9vBG40KizHsQDOxMeNymwhw=.9bc29879-eead-401c-b750-814592feff63@github.com> <-1wiWF_UEvCO6xPuYvIsElBzPPQDejGahm9Xd5YszPU=.cfb41cb1-f681-4e75-8c29-2d928468f53b@github.com> Message-ID: <42lOFbyCuQt4xj-pK-ME6ScceXqTnGOY0HrWnJMK56k=.87b29936-511f-4ba4-a429-e8b9faed83a2@github.com> On Sun, 30 Nov 2025 08:03:32 GMT, Zihao Lin wrote: >> I had a closer look and I think you ran into an inconsistency. Let me see if I can get it fixed as a separate change. > > Sure, it's better to separate to another change. I am not familiar this part, please pin me if you have better solution. Thanks! I filed https://bugs.openjdk.org/browse/JDK-8373143 for this but I keep finding new issues. So this one will take some time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24258#discussion_r2592955645 From qamai at openjdk.org Fri Dec 5 15:05:53 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 Dec 2025 15:05:53 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v18] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 08:57:06 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - Merge branch 'master' into enhance-clz-type > - Make code more compact > - Fix include order > - Merge branch 'master' into enhance-clz-type > - Merge branch 'master' into enhance-clz-type > - Fix constant fold > - Remove redundant import > - Add random range tests > - Add more comments to IR test > - Add more constant folding tests for CLZ/CTZ > - ... and 14 more: https://git.openjdk.org/jdk/compare/674cc3ee...f0687754 LGTM ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/25928#pullrequestreview-3545147861 From qamai at openjdk.org Fri Dec 5 15:05:55 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 Dec 2025 15:05:55 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v3] In-Reply-To: References: <2vGPKe7ESZqYjemMvDjFxb4QTk3VjybE0lk59Vqj1Ts=.e6a555a5-407b-4389-8db5-aa02a7de9960@github.com> Message-ID: On Tue, 19 Aug 2025 13:43:37 GMT, Emanuel Peter wrote: >>> Can you explain why you need this? Why is `count_trailing_zeros` and `count_leading_zeros` not enough, when you cast at the use-site? >> >> @eme64 The explanation of @merykitty is right, the implementation of `count_leading_zeros` and `count_trailing_zeros` reject zero as the input. >> >> Perhaps we could open another PR to add zero support for these functions, since it's less relevant to this node type change and might require other changes to the code that calls them. > > In `src/hotspot/share/utilities/count_leading_zeros.hpp`, it says that 0 behavior is undefined. Ok... but why do we do that? Is that a performance optimization ? If yes, is it really worth it? If there is no good reason not to handle 0, we should just handle it. > > We have some tests in `test/hotspot/gtest/utilities/test_count_leading_zeros.cpp`. > > It would be interesting to quickly check if any use of these methods could ever encounter zero, and then hit the assert. I would not be surprised if we found a bug here. > > I think this would be a worth while cleanup task. I would prefer if we clean things up now, and don't just let more special handling code get integrated. @eme64 It is because the intrinsics we use give unspecified results for 0, so it just propagated upward. I think it is definitely preferable to fix this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25928#discussion_r2592995218 From chagedorn at openjdk.org Fri Dec 5 16:29:35 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Dec 2025 16:29:35 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v23] In-Reply-To: <8kNvPKU3I3PdOKtInEoHzV-i8T6-IETIBup-bxcr7_c=.91cc1d46-6d49-4fb7-9302-55597b7ae428@github.com> References: <6Yo2VYqBk_iaUpAGdPvyCjOyn_XW2nVPN5_w8XbXvkU=.91138210-54e3-4c28-b1d8-eb706583348e@github.com> <8kNvPKU3I3PdOKtInEoHzV-i8T6-IETIBup-bxcr7_c=.91cc1d46-6d49-4fb7-9302-55597b7ae428@github.com> Message-ID: <3vcLDogQ7FM6ga5oz_UchKRew9uy9WqknrFfJgJHxw0=.73b17591-fa84-4152-a3d2-b1685dba0fdf@github.com> On Thu, 27 Nov 2025 21:06:18 GMT, Kangcheng Xu wrote: >> Was too busy this week, will try to come back to this next week! > > @chhagedorn Thank you reviewing. I'm glad to hear I'm making progress. Please see [my pervious comment](https://github.com/openjdk/jdk/pull/24458#discussion_r2569790528) regarding iteratively uncasting `xphi()`. > >> [...] give your patch a spin in our standard testing [...] > > Yes please. I've addressed last few suggestions and merged in the master. > >> [...] run some more extended testing with your old vs. new counted loop transformation state [...] > > Good idea. I've updated the old vs. new code based on the latest patch on this pr. Please find it on the [`counted-loop-refactor-old-vs-new` branch](https://github.com/tabjy/jdk/commits/counted-loop-refactor-old-vs-new/). > > Please let me know how the testing goes. Thank you very much once again! Thanks @tabjy for the update. Was too busy this week with the mainline fork but I'm happy to take another look next week :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-3617590462 From rsunderbabu at openjdk.org Fri Dec 5 17:11:13 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Fri, 5 Dec 2025 17:11:13 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v2] In-Reply-To: References: Message-ID: > Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. > MD5 > SHA1 > SHA256 > SHA3 > > Testing: > All flag combinations from CI > hotspot tiers 1 to 5 > PS: only for tier testings, mac-aarch was skipped due to resource constraints Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: remove requires condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28634/files - new: https://git.openjdk.org/jdk/pull/28634/files/e5d1497c..654604b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28634&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28634&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28634/head:pull/28634 PR: https://git.openjdk.org/jdk/pull/28634 From rsunderbabu at openjdk.org Fri Dec 5 17:15:43 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Fri, 5 Dec 2025 17:15:43 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v2] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 17:11:13 GMT, Ramkumar Sunderbabu wrote: >> Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. >> MD5 >> SHA1 >> SHA256 >> SHA3 >> >> Testing: >> All flag combinations from CI >> hotspot tiers 1 to 5 >> PS: only for tier testings, mac-aarch was skipped due to resource constraints > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > remove requires condition I didn't modify TestUseSHA3IntrinsicsOptionOnSupportedCPU.java since it was failing once I removed the condition completely. `@requires os.arch == "aarch64" & os.family == "mac"` `----------System.err:(32/2790)---------- stdout: []; stderr: [Java HotSpot(TM) 64-Bit Server VM warning: Option NeverActAsServerClassMachine was deprecated in version 26.0 and will likely be removed in a future release. java version "26-ea" 2026-03-17 Java(TM) SE Runtime Environment (build 26-ea+26-2610) Java HotSpot(TM) 64-Bit Server VM (build 26-ea+26-2610, mixed mode, sharing) ] exitValue = 0 java.lang.AssertionError: Expected message not found: 'Intrinsics for SHA3-224, SHA3-256, SHA3-384 and SHA3-512 crypto hash functions not available on this CPU.'. Enabling option 'UseSHA3Intrinsics' should not be possible and should result in a warning if -XX:-UseSHA was passed to JVM at jdk.test.lib.cli.CommandLineOptionTest.verifyOutput(CommandLineOptionTest.java:159) at jdk.test.lib.cli.CommandLineOptionTest.verifyJVMStartup(CommandLineOptionTest.java:130) at jdk.test.lib.cli.CommandLineOptionTest.verifySameJVMStartup(CommandLineOptionTest.java:211) at compiler.intrinsics.sha.cli.testcases.GenericTestCaseForSupportedCPU.verifyWarnings(GenericTestCaseForSupportedCPU.java:82) at compiler.intrinsics.sha.cli.DigestOptionsBase$TestCase.test(DigestOptionsBase.java:162) at compiler.intrinsics.sha.cli.DigestOptionsBase.runTestCases(DigestOptionsBase.java:139) at jdk.test.lib.cli.CommandLineOptionTest.test(CommandLineOptionTest.java:544) at compiler.intrinsics.sha.cli.TestUseSHA3IntrinsicsOptionOnSupportedCPU.main(TestUseSHA3IntrinsicsOptionOnSupportedCPU.java:47) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) at java.base/java.lang.Thread.run(Thread.java:1516) Caused by: java.lang.RuntimeException: 'Intrinsics for SHA3-224, SHA3-256, SHA3-384 and SHA3-512 crypto hash functions not available on this CPU.' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldMatch(OutputAnalyzer.java:407) at jdk.test.lib.cli.CommandLineOptionTest.verifyOutput(CommandLineOptionTest.java:154) ... 11 more JavaTest Message: Test threw exception: java.lang.AssertionError: Expected message not found: 'Intrinsics for SHA3-224, SHA3-256, SHA3-384 and SHA3-512 crypto hash functions not available on this CPU.'. Enabling option 'UseSHA3Intrinsics' should not be possible and should result in a warning if -XX:-UseSHA was passed to JVM JavaTest Message: shutting down test STATUS:Failed.`main' threw exception: java.lang.AssertionError: Expected message not found: 'Intrinsics for SHA3-224, SHA3-256, SHA3-384 and SHA3-512 crypto hash functions not available on this CPU.'. Enabling option 'UseSHA3Intrinsics' should not be possible and should result in a warning if -XX:-UseSHA was passed to JVM ` ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3617772383 From duke at openjdk.org Fri Dec 5 17:24:57 2025 From: duke at openjdk.org (ExE Boss) Date: Fri, 5 Dec 2025 17:24:57 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v4] In-Reply-To: References: <82Ddhg3yXemMeyKmZUCWZIPUVOTkdCbXiOcl8LO_Su0=.47680bc7-526d-4c15-9b84-dd9c7d27728d@github.com> Message-ID: On Mon, 1 Dec 2025 22:46:10 GMT, Vladimir Ivanov wrote: >> What?I?meant was?where the?`instanceof` is?in the?called?method, the `testInstanceOfCondPre` all?have the?`instanceof`?checks as?part of?the?`if`?statement. >> >> -------------------------------------------------------------------------------- >> >> Something?like: >> >> static void testInstanceOfCondDefaultInlinePre(A a, boolean cond) { >> if (defaultInlineInstanceOfCondPre(a, cond)) { >> a.m(); >> } >> } >> static void testInstanceOfCondDefaultInlinePost(A a, boolean cond) { >> if (defaultInlineInstanceOfCondPost(a, cond)) { >> a.m(); >> } >> } >> >> static void testIsInstanceCondDefaultInlinePre(A a, boolean cond) { >> if (defaultInlineIsInstanceCondPre(a, cond)) { >> a.m(); >> } >> } >> static void testIsInstanceCondDefaultInlinePost(A a, boolean cond) { >> if (defaultInlineIsInstanceCondPost(a, cond)) { >> a.m(); >> } >> } >> >> >> -------------------------------------------------------------------------------- >> >> I?suggest adding?such a?test because?of real?world?code which?use?different internal?implementation classes but?expose their?public?API as?only a?single common?supertype, like?`java.lang.constant.ClassDesc` and?its?`isPrimitive()`/`isArray()`/`isClassOrInterface()` methods (which?currently don?t do?the?`instanceof`?check, but?they probably?should so?that they?can be?reliably?inlined). > > The test is intended as a white-box test. It focuses on bytecode shapes which result in different IR representations and exercise different optimizations. From compiler perspective, there's no difference between `if (defaultInlineInstanceOfCond(a)) { ... }` and `if (a instanceof B) {...}` when inlining happens during parsing. Both test cases produce the very same IR after parsing is over. It?might be?useful to?have these?tests in?case the?default?inlining IR?changes in?the?future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2593410927 From rsunderbabu at openjdk.org Fri Dec 5 17:35:15 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Fri, 5 Dec 2025 17:35:15 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v3] In-Reply-To: References: Message-ID: > Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. > MD5 > SHA1 > SHA256 > SHA3 > > Testing: > All flag combinations from CI > hotspot tiers 1 to 5 > PS: only for tier testings, mac-aarch was skipped due to resource constraints Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: Fix TestUseSHA3IntrinsicsOptionOnSupportedCPU ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28634/files - new: https://git.openjdk.org/jdk/pull/28634/files/654604b9..8982a058 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28634&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28634&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/28634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28634/head:pull/28634 PR: https://git.openjdk.org/jdk/pull/28634 From rsunderbabu at openjdk.org Fri Dec 5 17:35:16 2025 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Fri, 5 Dec 2025 17:35:16 GMT Subject: RFR: 8372941: Rework compiler/intrinsics/sha tests to use intrinsic availability [v2] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 17:11:13 GMT, Ramkumar Sunderbabu wrote: >> Predicate probes of the following algos are changed to rely on intrinsics availability in the platform as opposed to hardware support availability. >> MD5 >> SHA1 >> SHA256 >> SHA3 >> >> Testing: >> All flag combinations from CI >> hotspot tiers 1 to 5 >> PS: only for tier testings, mac-aarch was skipped due to resource constraints > > Ramkumar Sunderbabu has updated the pull request incrementally with one additional commit since the last revision: > > remove requires condition With `@requires os.arch == "aarch64"`, TestUseSHA3IntrinsicsOptionOnSupportedCPU is working. However, I don't understand why IntrinsicPredicates.isSHA3IntrinsicAvailable() is not enough in some cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/28634#issuecomment-3617830847 From cjplummer at openjdk.org Fri Dec 5 20:38:58 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 5 Dec 2025 20:38:58 GMT Subject: RFR: 8370846: Support execution of mlvm testing with test thread factory [v2] In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 22:35:34 GMT, Leonid Mesnik wrote: >> The MainWrapper used test thread factory has generated lambda method. So the AbsentInformationException is expected. The actual source path is not checked. >> >> Tested by run mlvm tests with and without test thread factory. >> >> Also >> jdk/test/lib/thread/TestThreadFactory.java >> updated to provide TestThreadFactory. isTestThreadFactorySet() >> that could be used by tests instead of checking property "test.thread.factory" directly. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > improved comment Looks good. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/28028#pullrequestreview-3546371263 From lmesnik at openjdk.org Fri Dec 5 21:23:08 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 5 Dec 2025 21:23:08 GMT Subject: Integrated: 8370846: Support execution of mlvm testing with test thread factory In-Reply-To: References: Message-ID: On Tue, 28 Oct 2025 21:06:29 GMT, Leonid Mesnik wrote: > The MainWrapper used test thread factory has generated lambda method. So the AbsentInformationException is expected. The actual source path is not checked. > > Tested by run mlvm tests with and without test thread factory. > > Also > jdk/test/lib/thread/TestThreadFactory.java > updated to provide TestThreadFactory. isTestThreadFactorySet() > that could be used by tests instead of checking property "test.thread.factory" directly. This pull request has now been integrated. Changeset: 2596608b Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/2596608ba1bb1b271dfa062bf732a5095e22fffd Stats: 37 lines in 2 files changed: 32 ins; 0 del; 5 mod 8370846: Support execution of mlvm testing with test thread factory Reviewed-by: cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/28028 From vlivanov at openjdk.org Sat Dec 6 01:10:57 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 6 Dec 2025 01:10:57 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v2] In-Reply-To: References: <5eysoU9a44W7_cWds1pgbO9cpxQpBbtd54cUglfEW8c=.d0307e92-d9b3-405c-b488-872243af83b1@github.com> Message-ID: <1SgxHBiy8F3cswUdwmWr_gtjxdSiZ3K-JQUZvCcT4hY=.d35cdbdb-ec84-47a7-8302-f3759c2b020f@github.com> On Wed, 3 Dec 2025 02:19:46 GMT, Dean Long wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Test fix > > src/hotspot/share/opto/parse2.cpp line 1737: > >> 1735: (*cast_type) = tcon->isa_klassptr()->as_instance_type(); >> 1736: return true; // found >> 1737: } > > The old code checked klass_is_exact() for this case, but the new code does not, so was it redundant, given we have a constant? Yes, the check is redundant. Moreover, I tested the patch having the check turned into an assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2594314864 From vlivanov at openjdk.org Sat Dec 6 01:15:55 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 6 Dec 2025 01:15:55 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v4] In-Reply-To: <4abJMXdHzqKGqU58EXHaXO7849B0a64NoShEvU110I4=.87a93e5c-b73e-4c6e-b85b-8797eea8814d@github.com> References: <4abJMXdHzqKGqU58EXHaXO7849B0a64NoShEvU110I4=.87a93e5c-b73e-4c6e-b85b-8797eea8814d@github.com> Message-ID: <7tncm6HgyrCXyN7VAYAoo4e0igls2GofazYW-4PyzMg=.ce6e21b1-b2ac-4482-b661-b69cb3aa22f7@github.com> On Wed, 3 Dec 2025 10:40:52 GMT, Quan Anh Mai wrote: >> Yes, it's a good idea and the right direction to move. While experimenting with a different enhancement, I noticed that a subtype check leaves a null check behind irrespective of whether the check goes away or not. >> >> Unfortunately, there are some engineering considerations which complicates the change. `SubTypeCheck` is shared across all the places where subtype checks are performed, but `checkcast` and `instanceof` differ in the way `null` is handled. So, the proper way to fix it is to introduce a higher-level representation which implicitly handles nulls and then eventually lower it to `SubTypeCheck` and materialize null check if needed. > > There are multiple ways without having to have yet another higher-level representation. The first one is that since `SubTypeCheck` does not accept `null` now, we can just choose one result for `null`. Choosing the `instanceof` approach may be a little more desirable, as it removes the need to perform this complicated match, and for `checkcast` we can manually insert a `CheckCastPP` anyway. Another solution is to have another input to `SubTypeCheck` which gives the result when the `obj` is `null`. On a whim, I kind of like this, as we can match both the `checkcast` and the `instanceof` pattern here, it also simplifies `GraphKit::gen_checkcast`, as we do not have to worry about "the cast that always succeeds will leave behind a null check". > > Just a suggestion, though. This PR is fine as it is to me. I agree it can be implemented without introducing new fancy IR nodes. The open question to me though is whether we can live without materializing null check until `SubTypeCheck` nodes are macro expanded. Otherwise, it'll turn into a gradual lowering though different representations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2594317538 From vlivanov at openjdk.org Sat Dec 6 01:47:56 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 6 Dec 2025 01:47:56 GMT Subject: RFR: 8372634: C2: Materialize type information from instanceof checks [v4] In-Reply-To: References: <82Ddhg3yXemMeyKmZUCWZIPUVOTkdCbXiOcl8LO_Su0=.47680bc7-526d-4c15-9b84-dd9c7d27728d@github.com> Message-ID: On Fri, 5 Dec 2025 17:22:19 GMT, ExE Boss wrote: >> The test is intended as a white-box test. It focuses on bytecode shapes which result in different IR representations and exercise different optimizations. From compiler perspective, there's no difference between `if (defaultInlineInstanceOfCond(a)) { ... }` and `if (a instanceof B) {...}` when inlining happens during parsing. Both test cases produce the very same IR after parsing is over. > > It?might be?useful to?have these?tests in?case the?default?inlining IR?changes in?the?future. It's intended as a unit test. It's better to catch inlining issue with targeted tests on inlining. From compiler perspective, there's no reason to cover other cases here. There are so many different scenarios how a subtype check can show up in IR. And different scenarios can theoretically fail due to different reasons. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28517#discussion_r2594354870 From duke at openjdk.org Sat Dec 6 09:46:14 2025 From: duke at openjdk.org (duke) Date: Sat, 6 Dec 2025 09:46:14 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise [v18] In-Reply-To: References: Message-ID: On Fri, 5 Dec 2025 08:57:06 GMT, Qizheng Xing wrote: >> The result of count leading/trailing zeros is always non-negative, and the maximum value is integer type's size in bits. In previous versions, when C2 can not know the operand value of a CLZ/CTZ node at compile time, it will generate a full-width integer type for its result. This can significantly affect the efficiency of code in some cases. >> >> This patch makes the type of CLZ/CTZ nodes more precise, to make C2 generate better code. For example, the following implementation runs ~115% faster on x86-64 with this patch: >> >> >> public static int numberOfNibbles(int i) { >> int mag = Integer.SIZE - Integer.numberOfLeadingZeros(i); >> return Math.max((mag + 3) / 4, 1); >> } >> >> >> Testing: tier1, IR test > > Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - Merge branch 'master' into enhance-clz-type > - Make code more compact > - Fix include order > - Merge branch 'master' into enhance-clz-type > - Merge branch 'master' into enhance-clz-type > - Fix constant fold > - Remove redundant import > - Add random range tests > - Add more comments to IR test > - Add more constant folding tests for CLZ/CTZ > - ... and 14 more: https://git.openjdk.org/jdk/compare/674cc3ee...f0687754 @MaxXSoft Your change (at version f0687754fca4ce08f650bb49c6e96ebb0d5b99bf) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3619817305 From qxing at openjdk.org Sat Dec 6 09:46:05 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Sat, 6 Dec 2025 09:46:05 GMT Subject: RFR: 8360192: C2: Make the type of count leading/trailing zero nodes more precise In-Reply-To: References: <1yjEG7xjZcmvAECD2ovS0pW8IwA30p9BzCr0Krgy4ks=.3224b13e-a32f-468d-a6e3-3fb5a1c35c04@github.com> Message-ID: <3TUVz1ty0yJCxN2OsLPpH4znP6QtR3hQRyd9r8-TA08=.1cfc007e-bfce-491a-8107-577ee3e7af15@github.com> On Fri, 5 Dec 2025 09:10:37 GMT, Emanuel Peter wrote: >> @MaxXSoft Yes you are right, my mistake > > You do need 2 reviewers. I see that @merykitty has reviewed this a while ago, but a re-approval from him would be good since it is so long ago now. > > I'll run some testing now. @eme64 @merykitty Thanks for the re-review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25928#issuecomment-3619814845 From zlin at openjdk.org Sat Dec 6 12:07:04 2025 From: zlin at openjdk.org (Zihao Lin) Date: Sat, 6 Dec 2025 12:07:04 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v15] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > I have done more work which remove slice paramater from StoreNode::make. > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Merge branch 'master' into 8344116 - Merge branch 'master' into 8344116 - remove adr_type from graphKit - Fix test failed - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - fix conflict - Merge master - remove C2AccessValuePtr - fix assert - ... and 8 more: https://git.openjdk.org/jdk/compare/b0f59f60...c526f021 ------------- Changes: https://git.openjdk.org/jdk/pull/24258/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=14 Stats: 316 lines in 22 files changed: 47 ins; 89 del; 180 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From jkarthikeyan at openjdk.org Sat Dec 6 20:27:04 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Sat, 6 Dec 2025 20:27:04 GMT Subject: RFR: 8365570: C2 fails assert(false) failed: Unexpected node in SuperWord truncation: CastII [v2] In-Reply-To: <8jnY6pqofieRIfV5fCqFxvHZMF3nAZbh7yAD7C_G5FU=.a12c98f6-c715-43f5-9528-62fcfdfc6e59@github.com> References: <1z430wmE_HRTJqmLIC15VMUktLyUEE7qjkppr1GniAI=.e560a4e9-59f0-4013-ad65-5d7261cdbf0e@github.com> <8jnY6pqofieRIfV5fCqFxvHZMF3nAZbh7yAD7C_G5FU=.a12c98f6-c715-43f5-9528-62fcfdfc6e59@github.com> Message-ID: On Fri, 5 Dec 2025 08:23:02 GMT, Emanuel Peter wrote: >> After taking a closer look, I think you're correct- I can reproduce the crash using just `@Warmup(0)` and `@Test`. I think I used both while debugging and didn't test whether it worked without `CompLevel.C2`. I've removed it in the latest commit. >> However, I noticed that after that I merged from master neither the test nor the reproducer failed compilation before the fix is added. I think another commit must have changed the generated graph so that it no longer tries to vectorize the `CastII`, leading to the crash not being triggered. I looked at the JBS entry and saw that there wasn't another reproducer for this, so I was a bit unsure on what to do. Should this patch be merged with the current test? > > @jaskarth Thanks for looking into it! > > I would still add the fix, just in case. And I think the test as well, even if it does not reproduce any more. > > I was wondering: before the merge, when the test still reproduced: > If you removed the `@Warmup(0)` and `CompLevel.C2`, and instead just do `framework.addFlags` with `-Xcomp`, would that reproduce too? If so, you could have a framework run with and one without Xcomp, the one with Xcomp also should have a compileonly. What do you think? > > Or we just push the patch as is, to be sure this is done and integrated. What do you think @chhagedorn ? Yep, I can replicate the crash on the old commit with `TestFramework.runWithFlags("-Xcomp", "-XX:CompileCommand=compileonly,*TestSubwordTruncation::*");` instead of `@Warmup(0)`. I think this would also be a good option, as it would let you get coverage with Xcomp on the other tests as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26827#discussion_r2595422874 From qamai at openjdk.org Sun Dec 7 12:08:18 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Dec 2025 12:08:18 GMT Subject: RFR: 8367341: C2: apply KnownBits and unsigned bounds to And / Or operations [v7] In-Reply-To: References: Message-ID: > Hi, > > This PR improves the implementation of `AndNode/OrNode/XorNode::Value` by taking advantages of the additional information in `TypeInt`. The implementation is pretty straightforward. A clever trick is that by analyzing the negative and positive ranges of a `TypeInt` separately, we have better info for the leading bits. I also implement gtest unit tests to verify the correctness and monotonicity of the inference functions. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge branch 'master' into andorxor - Merge branch 'master' into andorxor - Merge branch 'master' into andorxor - Add assertion for the helper in CTPComparator Co-authored-by: Emanuel Peter - remove std::hash - remove unordered_map, add some comments for all_instances_size - Emanuel's reviews - Improve Value inferences of And, Or, Xor and implement gtest for general Value inferences ------------- Changes: https://git.openjdk.org/jdk/pull/27618/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=27618&range=06 Stats: 964 lines in 9 files changed: 630 ins; 313 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/27618.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27618/head:pull/27618 PR: https://git.openjdk.org/jdk/pull/27618 From qamai at openjdk.org Sun Dec 7 12:12:10 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Dec 2025 12:12:10 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v5] In-Reply-To: References: Message-ID: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into typejoin - Merge branch 'master' into typejoin - Move dual to ASSERT only - Keep old version for verification - whitespace - Reimplement Type::join ------------- Changes: https://git.openjdk.org/jdk/pull/28051/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=04 Stats: 1885 lines in 7 files changed: 1013 ins; 479 del; 393 mod Patch: https://git.openjdk.org/jdk/pull/28051.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28051/head:pull/28051 PR: https://git.openjdk.org/jdk/pull/28051 From qamai at openjdk.org Sun Dec 7 19:23:12 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Dec 2025 19:23:12 GMT Subject: RFR: 8370914: C2: Reimplement Type::join [v6] In-Reply-To: References: Message-ID: > Hi, > > Currently, `Type::join` is implemented using `Type::dual`. The idea seems to be that the dual of a join would be the meet of the duals of the operands. This helps us avoid the need to implement a separate join operation. The comments also discuss the symmetry of the join and the meet operations, which seems to refer to the various fundamental laws of set union and intersection. > > However, it requires us to find a representation of a `Type` class that is symmetric, which may not always be possible without outright dividing its value set into the normal set and the dual set, and effectively implementing join and meet separately (e.g. `TypeInt` and `TypeLong`). > > In other cases, the existence of dual types introduces additional values into the value set of a `Type` class. For example, a pointer can be a nullable pointer (`BotPTR`), a not-null pointer (`NotNull`), a not-null constant (`Constant`), a null constant (`Null`), an impossible value (`TopPTR`), and `AnyNull`? This is really hard to conceptualize even when we know that `AnyNull` is the dual of `NotNull`. It also does not really work, which leads to us sprinkling `above_centerline` checks all over the place. Additionally, the number of combinations in a meet increases quadratically with respect to the number of instances of a `Type`. This makes the already hard problem of meeting 2 complicated sets a nightmare to understand. > > This PR reimplements `Type::join` as a separate operation and removes most of the `dual` concept from the `Type` class hierachy. There are a lot of benefits of this: > > - It makes the operation much more intuitive, and changes to `Type` classes can be made easier. Instead of thinking about type lattices and how the representation places the `Type` objects on the lattices, it is much easier to conceptualize a join when we think a `Type` as a set of possible values. > - It tightens the invariants of the classes in the hierachy. Instead of having 5 possible `ptr()` value when a `TypeInstPtr` participating in a meet/join operation, there are only 3 left (`AnyNull` is non-sensical and `TopPTR` must be an `AnyPtr`). This, in turns, reduces the number of combinations in a meet/join from 25 to 9, making it much easier to reason about. > > This PR also tries to limit the interaction between unrelated types. For example, meeting and joining of a float and an int seem to happen only when we try to do those operations on the array types of those types. Limiting these peculiar operations to the pl... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: sort order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/28051/files - new: https://git.openjdk.org/jdk/pull/28051/files/c3b5d453..7d882903 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=28051&range=04-05 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/28051.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/28051/head:pull/28051 PR: https://git.openjdk.org/jdk/pull/28051