From github.com+2249648+johntortugo at openjdk.java.net Wed Sep 1 00:23:11 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Wed, 1 Sep 2021 00:23:11 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: <1ZjEPMgcx8a8qIfAMhBlFfQ1S0PUjbbv9r-uyF8cORc=.ad8ee7f3-603e-4ffd-b4e0-a12df3cf8ff3@github.com> References: <2uRU0b0fCTLTdN6jsB9mNpM_3BEgFZz7q4xHWnNG79I=.16186f49-c220-4bf7-aee1-c18f820e92a5@github.com> <1ZjEPMgcx8a8qIfAMhBlFfQ1S0PUjbbv9r-uyF8cORc=.ad8ee7f3-603e-4ffd-b4e0-a12df3cf8ff3@github.com> Message-ID: On Fri, 27 Aug 2021 17:59:10 GMT, Vladimir Kozlov wrote: >> I'm opting for having these tests in subfolders of `irTests` separated by type of optimization. But should we go with `compiler/irTests/*` or `/compiler/c2/irTests/*`? > > I agree to do cleanup and **correctly** separate c1/c2/shared tests (as compiler tests cleanup RFE). > > If we all agree with that then the answer for last question is `/compiler/c2/irTests/*` I moved the tests to `/compiler/c2/irTests/`. Please let me know if the split into subfolders I did is reasonable. ------------- PR: https://git.openjdk.java.net/jdk/pull/5135 From github.com+2249648+johntortugo at openjdk.java.net Wed Sep 1 00:23:11 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Wed, 1 Sep 2021 00:23:11 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: References: Message-ID: <8Ce6bZtHwGEw8_wXZz4ak3obprd1YmZDi4cItcXB4bA=.a7162709-7aad-4709-a585-d2391392f49b@github.com> > Hi, can I please get some reviews for this Pull Request? Here is a summary of the changes: > > - Add tests, using the new IR-based test framework, for several of the Ideal transformations on Add, Sub, Mul, Div, Loop nodes and some simple Scalar Replacement transformations. > - Add more default IR regex's to IR-based test framework. > - Changes to Sub, Div and Add Ideal nodes to that transformations on Int and Long types are the whenever possible same. > - Changes to Sub*Node, Div*Node and Add*Node Ideal methods to fix some bugs and include new transformations. > - New JTREG "ir_transformations" test group under test/hotspot/jtreg. John Tortugo has updated the pull request incrementally with 146 additional commits since the last revision: - Fix merge mistake. - Merge branch 'jdk-8267265' of https://github.com/JohnTortugo/jdk into jdk-8267265 - Addressing PR feedback: move tests to other directory, add custom tests, add tests for other optimizations, rename some tests. - 8273197: ProblemList 2 jtools tests due to JDK-8273187 8273198: ProblemList java/lang/instrument/BootClassPath/BootClassPathTest.sh due to JDK-8273188 Reviewed-by: naoto - 8262186: Call X509KeyManager.chooseClientAlias once for all key types Reviewed-by: xuelei - 8273186: Remove leftover comment about sparse remembered set in G1 HeapRegionRemSet Reviewed-by: ayang - 8273169: java/util/regex/NegativeArraySize.java failed after JDK-8271302 Reviewed-by: jiefu, serb - 8273092: Sort classlist in JDK image Reviewed-by: redestad, ihse, dfuchs - 8273144: Remove unused top level "Sample Collection Set Candidates" logging Reviewed-by: iwalulya, ayang - 8262095: NPE in Flow$FlowAnalyzer.visitApply: Cannot invoke getThrownTypes because tree.meth.type is null Co-authored-by: Jan Lahoda Co-authored-by: Vicente Romero Reviewed-by: jlahoda - ... and 136 more: https://git.openjdk.java.net/jdk/compare/ac430bf7...463102e2 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5135/files - new: https://git.openjdk.java.net/jdk/pull/5135/files/ac430bf7..463102e2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5135&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5135&range=02-03 Stats: 2570 lines in 18 files changed: 1410 ins; 1146 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/5135.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5135/head:pull/5135 PR: https://git.openjdk.java.net/jdk/pull/5135 From github.com+2249648+johntortugo at openjdk.java.net Wed Sep 1 00:23:11 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Wed, 1 Sep 2021 00:23:11 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: References: Message-ID: On Fri, 20 Aug 2021 08:59:12 GMT, Christian Hagedorn wrote: >> Thank you @chhagedorn , I think this is a good idea. I'll follow your suggestion and transform some tests into `custom run tests`. > > Great, thanks! Btw, you can merge and now use `RunInfo.getRandom().XX()` for a handy access to random values (if needed) as the PR for JDK-8272567 was integrated in the meantime. Hi, again @chhagedorn. I added some `custom run tests` to tests that seemed more "complex". Please let me know if there are others that you think I should add. ------------- PR: https://git.openjdk.java.net/jdk/pull/5135 From github.com+2249648+johntortugo at openjdk.java.net Wed Sep 1 00:26:50 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Wed, 1 Sep 2021 00:26:50 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: <8Ce6bZtHwGEw8_wXZz4ak3obprd1YmZDi4cItcXB4bA=.a7162709-7aad-4709-a585-d2391392f49b@github.com> References: <8Ce6bZtHwGEw8_wXZz4ak3obprd1YmZDi4cItcXB4bA=.a7162709-7aad-4709-a585-d2391392f49b@github.com> Message-ID: On Wed, 1 Sep 2021 00:23:11 GMT, John Tortugo wrote: >> Hi, can I please get some reviews for this Pull Request? Here is a summary of the changes: >> >> - Add tests, using the new IR-based test framework, for several of the Ideal transformations on Add, Sub, Mul, Div, Loop nodes and some simple Scalar Replacement transformations. >> - Add more default IR regex's to IR-based test framework. >> - Changes to Sub, Div and Add Ideal nodes to that transformations on Int and Long types are the whenever possible same. >> - Changes to Sub*Node, Div*Node and Add*Node Ideal methods to fix some bugs and include new transformations. >> - New JTREG "ir_transformations" test group under test/hotspot/jtreg. > > John Tortugo has updated the pull request incrementally with 146 additional commits since the last revision: > > - Fix merge mistake. > - Merge branch 'jdk-8267265' of https://github.com/JohnTortugo/jdk into jdk-8267265 > - Addressing PR feedback: move tests to other directory, add custom tests, add tests for other optimizations, rename some tests. > - 8273197: ProblemList 2 jtools tests due to JDK-8273187 > 8273198: ProblemList java/lang/instrument/BootClassPath/BootClassPathTest.sh due to JDK-8273188 > > Reviewed-by: naoto > - 8262186: Call X509KeyManager.chooseClientAlias once for all key types > > Reviewed-by: xuelei > - 8273186: Remove leftover comment about sparse remembered set in G1 HeapRegionRemSet > > Reviewed-by: ayang > - 8273169: java/util/regex/NegativeArraySize.java failed after JDK-8271302 > > Reviewed-by: jiefu, serb > - 8273092: Sort classlist in JDK image > > Reviewed-by: redestad, ihse, dfuchs > - 8273144: Remove unused top level "Sample Collection Set Candidates" logging > > Reviewed-by: iwalulya, ayang > - 8262095: NPE in Flow$FlowAnalyzer.visitApply: Cannot invoke getThrownTypes because tree.meth.type is null > > Co-authored-by: Jan Lahoda > Co-authored-by: Vicente Romero > Reviewed-by: jlahoda > - ... and 136 more: https://git.openjdk.java.net/jdk/compare/ac430bf7...463102e2 Hi folks, I'd appreciate if you could take a look again. I addressed the comments so far and here is a summary of the latest changes: - Moved all tests to directory `/compiler/c2/irTests` under hotspot JTREG group. - Added a few `custom run tests` for testing some corner cases. - Renamed some tests files and some test methods. - Added new tests for testing the changes introduced by [JDK-8270823](https://bugs.openjdk.java.net/browse/JDK-8270823) ------------- PR: https://git.openjdk.java.net/jdk/pull/5135 From ddong at openjdk.java.net Wed Sep 1 01:50:51 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Wed, 1 Sep 2021 01:50:51 GMT Subject: RFR: 8273148: Shouldn't increment _ifop_count when IfOp is created In-Reply-To: References: Message-ID: On Tue, 31 Aug 2021 00:12:52 GMT, Dean Long wrote: > The comment for int _ifop_count says "the number of IfOps successfully simplified", not "eliminated". If you want to distinguish between simplified and eliminated, I think we need another counter. Thanks. I think you are right, _ifop_count represents the number of simplifications of IfOp.Although IfOp is still created here, some operations are simplified, so it is reasonable to increment _ifop_count. I will close this PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/5305 From ddong at openjdk.java.net Wed Sep 1 01:50:52 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Wed, 1 Sep 2021 01:50:52 GMT Subject: Withdrawn: 8273148: Shouldn't increment _ifop_count when IfOp is created In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 16:16:10 GMT, Denghui Dong wrote: > Hi, > > Please help review this small fix. > > IIUC, _ifop_count is incremented when IfOp is no need to create, but there is a wrong increment operation in the current implementation(CE_Eliminator::make_ifop). > > Thanks, > Denghui This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5305 From yyang at openjdk.java.net Wed Sep 1 02:38:47 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 1 Sep 2021 02:38:47 GMT Subject: RFR: 8272377: assert preconditions that are ensured when created in add_final_edges [v2] In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 07:03:50 GMT, Yi Yang wrote: >> I think we can assert preconditions, which are ensured by add_node_to_connection_graph when they were first added to CG, in add_final_edges when they are delay handled, then we can skip checking these preconditions in runtime(assertions instead). This PR also does some code refactor. > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > use Unexpected node type message Thanks for the review, Vladimir K. ! ------------- PR: https://git.openjdk.java.net/jdk/pull/5101 From fmatte at openjdk.java.net Wed Sep 1 05:56:10 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Wed, 1 Sep 2021 05:56:10 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 [v2] In-Reply-To: References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: <3s5tEfAOoagk3KUNawQSUcD8XC0KFHaJliJ9coABlTs=.40ec1088-4aaa-4808-943e-3f69ea0ac82a@github.com> On Tue, 31 Aug 2021 14:54:56 GMT, Fairoz Matte wrote: >> This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp >> >> The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >>> __ move(addr, tmp); >> However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. >> The fix is to check if it is is_oop() and call the mov appropriately. >> >> No issues found in local testing and Mach5 tier1-3 > > Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: > > 8272563: Possible assertion failure in CardTableBarrierSetC1 Thanks, updated the patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From fmatte at openjdk.java.net Wed Sep 1 05:56:09 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Wed, 1 Sep 2021 05:56:09 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 [v3] In-Reply-To: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: > This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp > > The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >> __ move(addr, tmp); > However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. > The fix is to check if it is is_oop() and call the mov appropriately. > > No issues found in local testing and Mach5 tier1-3 Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: 8272563: Possible assertion failure in CardTableBarrierSetC1 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5164/files - new: https://git.openjdk.java.net/jdk/pull/5164/files/c023f4bc..5c1e1d49 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5164&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5164&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5164.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5164/head:pull/5164 PR: https://git.openjdk.java.net/jdk/pull/5164 From iveresov at openjdk.java.net Wed Sep 1 06:55:45 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 1 Sep 2021 06:55:45 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 [v3] In-Reply-To: References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 1 Sep 2021 05:56:09 GMT, Fairoz Matte wrote: >> This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp >> >> The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >>> __ move(addr, tmp); >> However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. >> The fix is to check if it is is_oop() and call the mov appropriately. >> >> No issues found in local testing and Mach5 tier1-3 > > Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: > > 8272563: Possible assertion failure in CardTableBarrierSetC1 src/hotspot/share/gc/shared/c1/cardTableBarrierSetC1.cpp line 72: > 70: LIR_Opr addr_opr = LIR_OprFact::address(new LIR_Address(addr, addr->type())); > 71: __ leal(addr_opr, tmp); > 72: __ move(addr, tmp); You don't need the move anymore. ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From fmatte at openjdk.java.net Wed Sep 1 06:55:45 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Wed, 1 Sep 2021 06:55:45 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 [v3] In-Reply-To: References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 1 Sep 2021 06:49:47 GMT, Igor Veresov wrote: >> Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: >> >> 8272563: Possible assertion failure in CardTableBarrierSetC1 > > src/hotspot/share/gc/shared/c1/cardTableBarrierSetC1.cpp line 72: > >> 70: LIR_Opr addr_opr = LIR_OprFact::address(new LIR_Address(addr, addr->type())); >> 71: __ leal(addr_opr, tmp); >> 72: __ move(addr, tmp); > > You don't need the move anymore. ok, will remove that. ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From iveresov at openjdk.java.net Wed Sep 1 07:10:07 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 1 Sep 2021 07:10:07 GMT Subject: RFR: 8272563: assert(is_double_stack() && !is_virtual()) failed: type check [v4] In-Reply-To: References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 1 Sep 2021 07:06:21 GMT, Fairoz Matte wrote: >> This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp >> >> The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >>> __ move(addr, tmp); >> However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. >> The fix is to check if it is is_oop() and call the mov appropriately. >> >> No issues found in local testing and Mach5 tier1-3 > > Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: > > 8272563: Possible assertion failure in CardTableBarrierSetC1 Marked as reviewed by iveresov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From fmatte at openjdk.java.net Wed Sep 1 07:10:07 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Wed, 1 Sep 2021 07:10:07 GMT Subject: RFR: 8272563: assert(is_double_stack() && !is_virtual()) failed: type check [v4] In-Reply-To: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: > This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp > > The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >> __ move(addr, tmp); > However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. > The fix is to check if it is is_oop() and call the mov appropriately. > > No issues found in local testing and Mach5 tier1-3 Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: 8272563: Possible assertion failure in CardTableBarrierSetC1 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5164/files - new: https://git.openjdk.java.net/jdk/pull/5164/files/5c1e1d49..359cbf3d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5164&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5164&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5164.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5164/head:pull/5164 PR: https://git.openjdk.java.net/jdk/pull/5164 From iveresov at openjdk.java.net Wed Sep 1 07:10:08 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 1 Sep 2021 07:10:08 GMT Subject: RFR: 8272563: assert(is_double_stack() && !is_virtual()) failed: type check [v3] In-Reply-To: References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 1 Sep 2021 05:56:09 GMT, Fairoz Matte wrote: >> This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp >> >> The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >>> __ move(addr, tmp); >> However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. >> The fix is to check if it is is_oop() and call the mov appropriately. >> >> No issues found in local testing and Mach5 tier1-3 > > Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: > > 8272563: Possible assertion failure in CardTableBarrierSetC1 Looks good! ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From thartmann at openjdk.java.net Wed Sep 1 10:12:50 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 1 Sep 2021 10:12:50 GMT Subject: RFR: 8272563: assert(is_double_stack() && !is_virtual()) failed: type check [v4] In-Reply-To: References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 1 Sep 2021 07:10:07 GMT, Fairoz Matte wrote: >> This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp >> >> The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >>> __ move(addr, tmp); >> However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. >> The fix is to check if it is is_oop() and call the mov appropriately. >> >> No issues found in local testing and Mach5 tier1-3 > > Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: > > 8272563: Possible assertion failure in CardTableBarrierSetC1 Looks good to me too! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5164 From fmatte at openjdk.java.net Wed Sep 1 10:15:52 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Wed, 1 Sep 2021 10:15:52 GMT Subject: Integrated: 8272563: assert(is_double_stack() && !is_virtual()) failed: type check In-Reply-To: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 18 Aug 2021 12:37:00 GMT, Fairoz Matte wrote: > This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp > > The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >> __ move(addr, tmp); > However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. > The fix is to check if it is is_oop() and call the mov appropriately. > > No issues found in local testing and Mach5 tier1-3 This pull request has now been integrated. Changeset: a58cf165 Author: Fairoz Matte Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/a58cf16509f3120d69fc18bd4c2c49e9ad590f73 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8272563: assert(is_double_stack() && !is_virtual()) failed: type check Reviewed-by: thartmann, iveresov ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From yyang at openjdk.java.net Wed Sep 1 10:46:50 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 1 Sep 2021 10:46:50 GMT Subject: Integrated: 8272377: assert preconditions that are ensured when created in add_final_edges In-Reply-To: References: Message-ID: On Thu, 12 Aug 2021 15:07:19 GMT, Yi Yang wrote: > I think we can assert preconditions, which are ensured by add_node_to_connection_graph when they were first added to CG, in add_final_edges when they are delay handled, then we can skip checking these preconditions in runtime(assertions instead). This PR also does some code refactor. This pull request has now been integrated. Changeset: 02822e13 Author: Yi Yang URL: https://git.openjdk.java.net/jdk/commit/02822e1398d6015f0ed26edd440db8e0d50bf152 Stats: 147 lines in 2 files changed: 36 ins; 46 del; 65 mod 8272377: assert preconditions that are ensured when created in add_final_edges Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5101 From thartmann at openjdk.java.net Wed Sep 1 13:06:11 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 1 Sep 2021 13:06:11 GMT Subject: RFR: 8273165: GraphKit::combine_exception_states fails with "matching stack sizes" assert In-Reply-To: <44saSltaOH9gYoqPDhCrAqoF-f41jbVZpaL_5DPgChs=.ecf62e56-67a7-441c-bde3-5027ced9b214@github.com> References: <44saSltaOH9gYoqPDhCrAqoF-f41jbVZpaL_5DPgChs=.ecf62e56-67a7-441c-bde3-5027ced9b214@github.com> Message-ID: On Wed, 1 Sep 2021 12:56:19 GMT, Vladimir Ivanov wrote: > The fix for JDK-8271276 uncovered another problem with incremental inlining through > virtual call sites: receiver null check breaks exception state combining in `GraphKit::replace_call()` > because the associtated exception path has the JVM state representing the point > right before the call (arguments are on stack). > > I propose a conservative fix which bails out the inlining attempt when receiver is not provably non-null. > > IMO the proper fix is to always add explicit receiver null check and teach > `Block::implicit_null_check()` about CallDynamicJava nodes. But that's for a > separate change. > > Testing: failing tests, hs-tier1 - hs-tier4 That looks good to me but I would suggest to use `recv_type->maybe_null()`. ------------- Changes requested by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5330 From vlivanov at openjdk.java.net Wed Sep 1 13:06:11 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 1 Sep 2021 13:06:11 GMT Subject: RFR: 8273165: GraphKit::combine_exception_states fails with "matching stack sizes" assert Message-ID: <44saSltaOH9gYoqPDhCrAqoF-f41jbVZpaL_5DPgChs=.ecf62e56-67a7-441c-bde3-5027ced9b214@github.com> The fix for JDK-8271276 uncovered another problem with incremental inlining through virtual call sites: receiver null check breaks exception state combining in `GraphKit::replace_call()` because the associtated exception path has the JVM state representing the point right before the call (arguments are on stack). I propose a conservative fix which bails out the inlining attempt when receiver is not provably non-null. IMO the proper fix is to always add explicit receiver null check and teach `Block::implicit_null_check()` about CallDynamicJava nodes. But that's for a separate change. Testing: failing tests, hs-tier1 - hs-tier4 ------------- Commit messages: - Require non-null receiver Changes: https://git.openjdk.java.net/jdk/pull/5330/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5330&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273165 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5330.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5330/head:pull/5330 PR: https://git.openjdk.java.net/jdk/pull/5330 From vlivanov at openjdk.java.net Wed Sep 1 13:08:45 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 1 Sep 2021 13:08:45 GMT Subject: RFR: 8271911: replay compilations of methods which use JSR292 (easy cases) [v2] In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 22:55:56 GMT, Dean Long wrote: >> There is a subset of the general problem that we should be able to solve by looking at invokedynamic/invokehandle call sites and MethodHandle constant pool entries. If a replay references a hidden class that is discoverable in one of those locations, then we can use the location as a replacement for the transient VM name. >> >> Examples of references to hidden class locations: >> >> @bci compiler/ciReplay/CiReplayBase$TestMain test (I)V 1 argL0 ; >> @bci compiler/ciReplay/CiReplayBase$TestMain main ([Ljava/lang/String;)V 0 form vmentry ; >> @cpi compiler/ciReplay/CiReplayBase$TestMain 56 form vmentry ; > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > rename LocPusher --> RecordLocation, add comments Very nice work, Dean! src/hotspot/share/ci/ciEnv.cpp line 1537: > 1535: > 1536: // Iterate over the class hierarchy > 1537: for (ClassHierarchyIterator iter(vmClasses::Object_klass()); !iter.done(); iter.next()) { Why do you iterate over the whole class hierarchy instead of inspecting only those classes which are present in CI? ------------- PR: https://git.openjdk.java.net/jdk/pull/5270 From vlivanov at openjdk.java.net Wed Sep 1 14:17:16 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 1 Sep 2021 14:17:16 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v2] In-Reply-To: References: Message-ID: > `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. > It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). > > `Unsafe::copyMemory()` performs argument validation which limits inputs either > to off-heap location (null + absolute address) or primitive on-heap array. > > The only cases when barriers are still needed are: > * mixed accesses (`Object+offset`); > * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). > > Testing: hs-tier1 - hs-tier6 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Remove memory barriers ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5259/files - new: https://git.openjdk.java.net/jdk/pull/5259/files/8ce16e39..aa9c083d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5259&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5259&range=00-01 Stats: 38 lines in 2 files changed: 3 ins; 15 del; 20 mod Patch: https://git.openjdk.java.net/jdk/pull/5259.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5259/head:pull/5259 PR: https://git.openjdk.java.net/jdk/pull/5259 From vlivanov at openjdk.java.net Wed Sep 1 14:17:18 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 1 Sep 2021 14:17:18 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic In-Reply-To: References: Message-ID: On Wed, 25 Aug 2021 22:17:05 GMT, Vladimir Ivanov wrote: > `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. > It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). > > `Unsafe::copyMemory()` performs argument validation which limits inputs either > to off-heap location (null + absolute address) or primitive on-heap array. > > The only cases when barriers are still needed are: > * mixed accesses (`Object+offset`); > * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). > > Testing: hs-tier1 - hs-tier6 Good point, Roland. Removed barrier-related logic and use `TypePtr::BOTTOM` instead. ------------- PR: https://git.openjdk.java.net/jdk/pull/5259 From vlivanov at openjdk.java.net Wed Sep 1 14:23:10 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 1 Sep 2021 14:23:10 GMT Subject: RFR: 8273165: GraphKit::combine_exception_states fails with "matching stack sizes" assert [v2] In-Reply-To: <44saSltaOH9gYoqPDhCrAqoF-f41jbVZpaL_5DPgChs=.ecf62e56-67a7-441c-bde3-5027ced9b214@github.com> References: <44saSltaOH9gYoqPDhCrAqoF-f41jbVZpaL_5DPgChs=.ecf62e56-67a7-441c-bde3-5027ced9b214@github.com> Message-ID: > The fix for JDK-8271276 uncovered another problem with incremental inlining through > virtual call sites: receiver null check breaks exception state combining in `GraphKit::replace_call()` > because the associtated exception path has the JVM state representing the point > right before the call (arguments are on stack). > > I propose a conservative fix which bails out the inlining attempt when receiver is not provably non-null. > > IMO the proper fix is to always add explicit receiver null check and teach > `Block::implicit_null_check()` about CallDynamicJava nodes. But that's for a > separate change. > > Testing: failing tests, hs-tier1 - hs-tier4 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Migrate to Type::maybe_null() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5330/files - new: https://git.openjdk.java.net/jdk/pull/5330/files/5b3132be..3cb08341 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5330&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5330&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5330.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5330/head:pull/5330 PR: https://git.openjdk.java.net/jdk/pull/5330 From vlivanov at openjdk.java.net Wed Sep 1 14:23:11 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 1 Sep 2021 14:23:11 GMT Subject: RFR: 8273165: GraphKit::combine_exception_states fails with "matching stack sizes" assert [v2] In-Reply-To: References: <44saSltaOH9gYoqPDhCrAqoF-f41jbVZpaL_5DPgChs=.ecf62e56-67a7-441c-bde3-5027ced9b214@github.com> Message-ID: On Wed, 1 Sep 2021 13:02:55 GMT, Tobias Hartmann wrote: > I would suggest to use recv_type->maybe_null(). Good point, fixed. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5330 From thartmann at openjdk.java.net Wed Sep 1 14:43:00 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 1 Sep 2021 14:43:00 GMT Subject: RFR: 8273165: GraphKit::combine_exception_states fails with "matching stack sizes" assert [v2] In-Reply-To: References: <44saSltaOH9gYoqPDhCrAqoF-f41jbVZpaL_5DPgChs=.ecf62e56-67a7-441c-bde3-5027ced9b214@github.com> Message-ID: On Wed, 1 Sep 2021 14:23:10 GMT, Vladimir Ivanov wrote: >> The fix for JDK-8271276 uncovered another problem with incremental inlining through >> virtual call sites: receiver null check breaks exception state combining in `GraphKit::replace_call()` >> because the associtated exception path has the JVM state representing the point >> right before the call (arguments are on stack). >> >> I propose a conservative fix which bails out the inlining attempt when receiver is not provably non-null. >> >> IMO the proper fix is to always add explicit receiver null check and teach >> `Block::implicit_null_check()` about CallDynamicJava nodes. But that's for a >> separate change. >> >> Testing: failing tests, hs-tier1 - hs-tier4 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Migrate to Type::maybe_null() Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5330 From kvn at openjdk.java.net Wed Sep 1 15:45:46 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 1 Sep 2021 15:45:46 GMT Subject: RFR: 8273165: GraphKit::combine_exception_states fails with "matching stack sizes" assert [v2] In-Reply-To: References: <44saSltaOH9gYoqPDhCrAqoF-f41jbVZpaL_5DPgChs=.ecf62e56-67a7-441c-bde3-5027ced9b214@github.com> Message-ID: On Wed, 1 Sep 2021 14:23:10 GMT, Vladimir Ivanov wrote: >> The fix for JDK-8271276 uncovered another problem with incremental inlining through >> virtual call sites: receiver null check breaks exception state combining in `GraphKit::replace_call()` >> because the associtated exception path has the JVM state representing the point >> right before the call (arguments are on stack). >> >> I propose a conservative fix which bails out the inlining attempt when receiver is not provably non-null. >> >> IMO the proper fix is to always add explicit receiver null check and teach >> `Block::implicit_null_check()` about CallDynamicJava nodes. But that's for a >> separate change. >> >> Testing: failing tests, hs-tier1 - hs-tier4 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Migrate to Type::maybe_null() Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5330 From vlivanov at openjdk.java.net Wed Sep 1 16:45:18 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 1 Sep 2021 16:45:18 GMT Subject: RFR: 8244675: assert(IncrementalInline || (_late_inlines.length() == 0 && !has_mh_late_inlines())) [v2] In-Reply-To: References: Message-ID: > Avoid populating late inline candidates list when post-parse inlining is disabled. > > Testing: hs-tier1 - hs-tier4 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Assert is too strong ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5249/files - new: https://git.openjdk.java.net/jdk/pull/5249/files/b80d2464..70b65952 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5249&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5249&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5249.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5249/head:pull/5249 PR: https://git.openjdk.java.net/jdk/pull/5249 From vlivanov at openjdk.java.net Wed Sep 1 16:45:27 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 1 Sep 2021 16:45:27 GMT Subject: RFR: 8244675: assert(IncrementalInline || (_late_inlines.length() == 0 && !has_mh_late_inlines())) In-Reply-To: References: Message-ID: On Wed, 25 Aug 2021 10:57:50 GMT, Vladimir Ivanov wrote: > Avoid populating late inline candidates list when post-parse inlining is disabled. > > Testing: hs-tier1 - hs-tier4 I have to remote the assert in `LateInlineCallGenerator` ctor because `LateInlineStringCallGenerator` and `LateInlineBoxingCallGenerator` don't depend on `IncrementalInline` flag. ------------- PR: https://git.openjdk.java.net/jdk/pull/5249 From kvn at openjdk.java.net Wed Sep 1 16:57:43 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 1 Sep 2021 16:57:43 GMT Subject: RFR: JDK-8272574: C2: assert(false) failed: Bad graph detected in build_loop_late [v9] In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 12:40:53 GMT, ?? wrote: >> Current loop predication will promote nodes(with a dependency on a later control node) to the insertion point which dominates the later control node. >> >> In the following example, loopPrediction will promote node 434 to the outer loop(predicted insert point is right after node 424), and it depends on control node 207. But node 424 dominates node 207, which means after the promotion, the cloned nodes have a control dependency on a later control node, which leads to a bad graph. >> >> ![image](https://user-images.githubusercontent.com/25214855/129720970-ff65b8f4-8bef-401d-8590-54aca6de470e.png) >> >> ![image](https://user-images.githubusercontent.com/25214855/129721369-4c61222b-7305-4522-9a37-e3e6e2138aa9.png) > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Remove un-necessary dot Place the test into `compiler/loopopts`. src/hotspot/share/opto/loopPredicate.cpp line 624: > 622: // Note: this function is particularly designed for loop predication. We require load_range > 623: // and offset to be loop invariant computed on the fly by "invar" > 624: bool IdealLoopTree::is_range_check_if(IfNode *iff, PhaseIdealLoop *phase, Invariance& invar, ProjNode *predicate_proj) const { `predicate_proj` is used only for assert. Use `DEBUG_ONLY(, ProjNode *predicate_proj)`. Unless you want to do the check in PRODUCT. src/hotspot/share/opto/loopPredicate.cpp line 665: > 663: return false; > 664: } > 665: #ifndef PRODUCT Should be `#ifdef ASSERT` for debug build. src/hotspot/share/opto/loopPredicate.cpp line 675: > 673: // point. > 674: // This situation can occur when pinning nodes too conservatively - can we do better? > 675: assert(false, "cyclic dependency prevents range check elimination"); Print useful data before assert to help debugging issue without rerunning. Dump `offset` node, for example. src/hotspot/share/opto/loopPredicate.cpp line 1158: > 1156: } > 1157: #endif > 1158: } else if (cl != NULL && loop->is_range_check_if(iff, this, invar, predicate_proj)) { Use DEBUG_ONLY(, predicate_proj). src/hotspot/share/opto/loopnode.hpp line 740: > 738: > 739: // Return TRUE if "iff" is a range check. > 740: bool is_range_check_if(IfNode *iff, PhaseIdealLoop *phase, Invariance& invar, ProjNode *predicate_proj) const; Use DEBUG_ONLY(, ProjNode *predicate_proj). src/hotspot/share/opto/memnode.cpp line 1628: > 1626: // Alter data node to use pre-phi inputs > 1627: if (this->in(0) == region) { > 1628: if (mem->is_Phi() && (mem->in(0) == region) && mem->in(i)->in(0) != NULL) { I have concern about this because you can have load's address dependency on a control node below memory's control. I saw cases when Load's address and memory control were Region through which it splits. And all controls were set to Region's inputs. Also consider that we did not split memory node yet. We may end up with load's clone placed above its memory. ------------- PR: https://git.openjdk.java.net/jdk/pull/5142 From kvn at openjdk.java.net Wed Sep 1 16:57:51 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 1 Sep 2021 16:57:51 GMT Subject: RFR: JDK-8272574: C2: assert(false) failed: Bad graph detected in build_loop_late [v9] In-Reply-To: References: Message-ID: <0PImAeJNxb4j4UtIQkiD_uWYmwwRHNWegzcoFk8rloY=.fd069bb3-efb1-4c3a-8b89-3bc0a1d054ad@github.com> On Wed, 1 Sep 2021 16:28:19 GMT, Vladimir Kozlov wrote: >> ?? has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove un-necessary dot > > src/hotspot/share/opto/loopPredicate.cpp line 665: > >> 663: return false; >> 664: } >> 665: #ifndef PRODUCT > > Should be `#ifdef ASSERT` for debug build. On other hand, consider do this check in PRODUCT too to bailout (return `false`). ------------- PR: https://git.openjdk.java.net/jdk/pull/5142 From kvn at openjdk.java.net Wed Sep 1 16:59:54 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 1 Sep 2021 16:59:54 GMT Subject: RFR: 8244675: assert(IncrementalInline || (_late_inlines.length() == 0 && !has_mh_late_inlines())) [v2] In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 16:45:18 GMT, Vladimir Ivanov wrote: >> Avoid populating late inline candidates list when post-parse inlining is disabled. >> >> Testing: hs-tier1 - hs-tier4 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Assert is too strong Okay. I suggest to move test into `compiler/inlining` directory. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5249 From dlong at openjdk.java.net Wed Sep 1 21:50:31 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 1 Sep 2021 21:50:31 GMT Subject: RFR: 8271911: replay compilations of methods which use JSR292 (easy cases) [v2] In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 13:05:22 GMT, Vladimir Ivanov wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> rename LocPusher --> RecordLocation, add comments > > Very nice work, Dean! @iwanowww Thanks for looking at this. > src/hotspot/share/ci/ciEnv.cpp line 1537: > >> 1535: >> 1536: // Iterate over the class hierarchy >> 1537: for (ClassHierarchyIterator iter(vmClasses::Object_klass()); !iter.done(); iter.next()) { > > Why do you iterate over the whole class hierarchy instead of inspecting only those classes which are present in CI? Good question. It is because of the section in ciInstanceKlass::dump_replay_data that dumps subclasses. If one of the CI classes is java.lang.Object, we can get a lot of hidden classes dumped there from startup that are unrelated to the current compile. I wanted to see how many I could find as a proof of concept / stress test. My plan is to see if we can completely do without subclass dumping there by dumping better CHA information (JDK-8261192). ------------- PR: https://git.openjdk.java.net/jdk/pull/5270 From wuyan at openjdk.java.net Thu Sep 2 02:23:33 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Thu, 2 Sep 2021 02:23:33 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v4] In-Reply-To: <_IKgIBUexpX3jnx8NyLS10-x3HKFMX6n2LftCPOeZSQ=.5e684393-6e9a-41ec-ab80-851f302b0904@github.com> References: <_IKgIBUexpX3jnx8NyLS10-x3HKFMX6n2LftCPOeZSQ=.5e684393-6e9a-41ec-ab80-851f302b0904@github.com> Message-ID: On Tue, 10 Aug 2021 09:16:51 GMT, Andrew Haley wrote: >>> Does your testing cover all that is added in this patch? If so, how did you ascertain it? >> >> Yes, These two test cases have covered the code in the patch. First, I unmark the unsupported opcodes in JDK-8268966. Then I test the cases in both of the above test files one by one, implementing the rules once the testcase failed until all the testcases pass. > >> > Does your testing cover all that is added in this patch? If so, how did you ascertain it? >> >> Yes, These two test cases have covered the code in the patch. First, I unmark the unsupported opcodes in JDK-8268966. Then I test the cases in both of the above test files one by one, implementing the rules once the testcase failed until all the testcases pass. > > OK. Could you do me a favor to review the patch? @theRealAph @nick-arm Thank you. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From wuyan at openjdk.java.net Thu Sep 2 02:26:29 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Thu, 2 Sep 2021 02:26:29 GMT Subject: RFR: 8272413: Incorrect num of element count calculation for vector cast In-Reply-To: References: Message-ID: On Wed, 18 Aug 2021 08:31:47 GMT, Wang Huang wrote: > Dear all, > Closed JDK-8265244 has split into two issues : JDK-8268966 and this issue. During this issue, I will fix the mid-end comparsion. > This patch is easy to understand. It is split from https://github.com/openjdk/jdk/pull/3507. I only fix the mid-end problem because the back-end problem has fixed in JDK-8268966 by @theRealELiu . > Thank you for your review. > > Yours, > WANG Huang Could you do me a favor to review the patch? @theRealAph @nick-arm Thank you. ------------- PR: https://git.openjdk.java.net/jdk/pull/5160 From github.com+25214855+casparcwang at openjdk.java.net Thu Sep 2 03:33:59 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Thu, 2 Sep 2021 03:33:59 GMT Subject: RFR: JDK-8272574: C2: assert(false) failed: Bad graph detected in build_loop_late [v10] In-Reply-To: References: Message-ID: > Current loop predication will promote nodes(with a dependency on a later control node) to the insertion point which dominates the later control node. > > In the following example, loopPrediction will promote node 434 to the outer loop(predicted insert point is right after node 424), and it depends on control node 207. But node 424 dominates node 207, which means after the promotion, the cloned nodes have a control dependency on a later control node, which leads to a bad graph. > > ![image](https://user-images.githubusercontent.com/25214855/129720970-ff65b8f4-8bef-401d-8590-54aca6de470e.png) > > ![image](https://user-images.githubusercontent.com/25214855/129721369-4c61222b-7305-4522-9a37-e3e6e2138aa9.png) ?? has updated the pull request incrementally with one additional commit since the last revision: guarrantee that address dependency on a control node above memory's control. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5142/files - new: https://git.openjdk.java.net/jdk/pull/5142/files/4ce8b2ba..967eecab Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5142&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5142&range=08-09 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5142.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5142/head:pull/5142 PR: https://git.openjdk.java.net/jdk/pull/5142 From eliu at openjdk.java.net Thu Sep 2 03:34:29 2021 From: eliu at openjdk.java.net (Eric Liu) Date: Thu, 2 Sep 2021 03:34:29 GMT Subject: RFR: 8272413: Incorrect num of element count calculation for vector cast In-Reply-To: References: Message-ID: <40DfLOXil86LSbehPHD0UpxDCLTz5EdqpSXCBWqBJY4=.8f877f7b-3f3c-4b30-859a-63ed45dac10b@github.com> On Wed, 18 Aug 2021 08:31:47 GMT, Wang Huang wrote: > Dear all, > Closed JDK-8265244 has split into two issues : JDK-8268966 and this issue. During this issue, I will fix the mid-end comparsion. > This patch is easy to understand. It is split from https://github.com/openjdk/jdk/pull/3507. I only fix the mid-end problem because the back-end problem has fixed in JDK-8268966 by @theRealELiu . > Thank you for your review. > > Yours, > WANG Huang Marked as reviewed by eliu (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/5160 From github.com+25214855+casparcwang at openjdk.java.net Thu Sep 2 04:16:03 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Thu, 2 Sep 2021 04:16:03 GMT Subject: RFR: JDK-8272574: C2: assert(false) failed: Bad graph detected in build_loop_late [v11] In-Reply-To: References: Message-ID: > Current loop predication will promote nodes(with a dependency on a later control node) to the insertion point which dominates the later control node. > > In the following example, loopPrediction will promote node 434 to the outer loop(predicted insert point is right after node 424), and it depends on control node 207. But node 424 dominates node 207, which means after the promotion, the cloned nodes have a control dependency on a later control node, which leads to a bad graph. > > ![image](https://user-images.githubusercontent.com/25214855/129720970-ff65b8f4-8bef-401d-8590-54aca6de470e.png) > > ![image](https://user-images.githubusercontent.com/25214855/129721369-4c61222b-7305-4522-9a37-e3e6e2138aa9.png) ?? has updated the pull request incrementally with one additional commit since the last revision: Add more debug infomation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5142/files - new: https://git.openjdk.java.net/jdk/pull/5142/files/967eecab..90ec4710 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5142&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5142&range=09-10 Stats: 6 lines in 2 files changed: 1 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/5142.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5142/head:pull/5142 PR: https://git.openjdk.java.net/jdk/pull/5142 From github.com+25214855+casparcwang at openjdk.java.net Thu Sep 2 04:20:27 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Thu, 2 Sep 2021 04:20:27 GMT Subject: RFR: JDK-8272574: C2: assert(false) failed: Bad graph detected in build_loop_late [v9] In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 16:53:51 GMT, Vladimir Kozlov wrote: > Place the test into `compiler/loopopts`. Thank you for your review. The test has moved to `loopopt` dir. > Should be `#ifdef ASSERT` for debug build. The assertion has been changed to debug only, and more debug information are printed. > I have concern about this because you can have load's address dependency on a control node below memory's control. > I saw cases when Load's address and memory control were Region through which it splits. And all controls were set to Region's inputs. I have the same concern too, so I just add another guard `MemNode::all_controls_dominate(address, region)`. > Also consider that we did not split memory node yet. We may end up with load's clone placed above its memory. when the control of the load is changed to the control of its memory, memory dependency of the load is also set in the following: if (mem->is_Phi() && (mem->in(0) == region)) { x->set_req(Memory, mem->in(i)); // Use pre-Phi input for the clone. } ------------- PR: https://git.openjdk.java.net/jdk/pull/5142 From njian at openjdk.java.net Thu Sep 2 07:38:43 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Thu, 2 Sep 2021 07:38:43 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v6] In-Reply-To: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: > This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge with master - More comments from Andrew. - Add missing part - Address Andrew's comments - 8267356: AArch64: Vector API SVE codegen support This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: 1. Code generation for Vector API c2 IR nodes with SVE. 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. 3. Some more SVE assemblers (and tests) used by the codegen part. Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. ------------- Changes: https://git.openjdk.java.net/jdk/pull/4122/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4122&range=05 Stats: 5761 lines in 13 files changed: 4576 ins; 195 del; 990 mod Patch: https://git.openjdk.java.net/jdk/pull/4122.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4122/head:pull/4122 PR: https://git.openjdk.java.net/jdk/pull/4122 From aph at openjdk.java.net Thu Sep 2 08:09:31 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 2 Sep 2021 08:09:31 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v4] In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 08:57:47 GMT, Wang Huang wrote: >> * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. >> * It may be a solver of JDK-8269866, or part of it. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix codes Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From vlivanov at openjdk.java.net Thu Sep 2 11:49:37 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 2 Sep 2021 11:49:37 GMT Subject: Integrated: 8273165: GraphKit::combine_exception_states fails with "matching stack sizes" assert In-Reply-To: <44saSltaOH9gYoqPDhCrAqoF-f41jbVZpaL_5DPgChs=.ecf62e56-67a7-441c-bde3-5027ced9b214@github.com> References: <44saSltaOH9gYoqPDhCrAqoF-f41jbVZpaL_5DPgChs=.ecf62e56-67a7-441c-bde3-5027ced9b214@github.com> Message-ID: On Wed, 1 Sep 2021 12:56:19 GMT, Vladimir Ivanov wrote: > The fix for JDK-8271276 uncovered another problem with incremental inlining through > virtual call sites: receiver null check breaks exception state combining in `GraphKit::replace_call()` > because the associtated exception path has the JVM state representing the point > right before the call (arguments are on stack). > > I propose a conservative fix which bails out the inlining attempt when receiver is not provably non-null. > > IMO the proper fix is to always add explicit receiver null check and teach > `Block::implicit_null_check()` about CallDynamicJava nodes. But that's for a > separate change. > > Testing: failing tests, hs-tier1 - hs-tier4 This pull request has now been integrated. Changeset: 632a7e08 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/632a7e0885596b70d34be319bd09d4df8e151d12 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod 8273165: GraphKit::combine_exception_states fails with "matching stack sizes" assert Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5330 From vlivanov at openjdk.java.net Thu Sep 2 11:49:36 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 2 Sep 2021 11:49:36 GMT Subject: RFR: 8273165: GraphKit::combine_exception_states fails with "matching stack sizes" assert [v2] In-Reply-To: References: <44saSltaOH9gYoqPDhCrAqoF-f41jbVZpaL_5DPgChs=.ecf62e56-67a7-441c-bde3-5027ced9b214@github.com> Message-ID: On Wed, 1 Sep 2021 14:23:10 GMT, Vladimir Ivanov wrote: >> The fix for JDK-8271276 uncovered another problem with incremental inlining through >> virtual call sites: receiver null check breaks exception state combining in `GraphKit::replace_call()` >> because the associtated exception path has the JVM state representing the point >> right before the call (arguments are on stack). >> >> I propose a conservative fix which bails out the inlining attempt when receiver is not provably non-null. >> >> IMO the proper fix is to always add explicit receiver null check and teach >> `Block::implicit_null_check()` about CallDynamicJava nodes. But that's for a >> separate change. >> >> Testing: failing tests, hs-tier1 - hs-tier4 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Migrate to Type::maybe_null() Thanks for the reviews, Tobias and Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/5330 From vlivanov at openjdk.java.net Thu Sep 2 11:50:28 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 2 Sep 2021 11:50:28 GMT Subject: RFR: 8244675: assert(IncrementalInline || (_late_inlines.length() == 0 && !has_mh_late_inlines())) [v2] In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 16:56:46 GMT, Vladimir Kozlov wrote: > I suggest to move test into compiler/inlining directory. The bug is in Vector API-specific code, so putting the test under compiler/vectorapi looks well-justified. ------------- PR: https://git.openjdk.java.net/jdk/pull/5249 From thartmann at openjdk.java.net Thu Sep 2 12:40:31 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 2 Sep 2021 12:40:31 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 14:17:16 GMT, Vladimir Ivanov wrote: >> `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. >> It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). >> >> `Unsafe::copyMemory()` performs argument validation which limits inputs either >> to off-heap location (null + absolute address) or primitive on-heap array. >> >> The only cases when barriers are still needed are: >> * mixed accesses (`Object+offset`); >> * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). >> >> Testing: hs-tier1 - hs-tier6 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Remove memory barriers Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5259 From vlivanov at openjdk.java.net Thu Sep 2 12:42:28 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 2 Sep 2021 12:42:28 GMT Subject: RFR: 8271911: replay compilations of methods which use JSR292 (easy cases) [v2] In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 21:45:26 GMT, Dean Long wrote: >> src/hotspot/share/ci/ciEnv.cpp line 1537: >> >>> 1535: >>> 1536: // Iterate over the class hierarchy >>> 1537: for (ClassHierarchyIterator iter(vmClasses::Object_klass()); !iter.done(); iter.next()) { >> >> Why do you iterate over the whole class hierarchy instead of inspecting only those classes which are present in CI? > > Good question. It is because of the section in ciInstanceKlass::dump_replay_data that dumps subclasses. If one of the CI classes is java.lang.Object, we can get a lot of hidden classes dumped there from startup that are unrelated to the current compile. I wanted to see how many I could find as a proof of concept / stress test. My plan is to see if we can completely do without subclass dumping there by dumping better CHA information (JDK-8261192). I still miss the connection between `ciInstanceKlass::dump_replay_data()` and `ciEnv::find_dynamic_call_sites()` cases. `ciEnv::find_dynamic_call_sites()` dumps all invokedynamic and invokehandle (MH.invoke*()) call sites and MethodHandle CP Constants across the class hierarchy. Any particular benefit compared to just dumping that info on per ciInstanceKlass granularity? ------------- PR: https://git.openjdk.java.net/jdk/pull/5270 From vlivanov at openjdk.java.net Thu Sep 2 12:42:28 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 2 Sep 2021 12:42:28 GMT Subject: RFR: 8271911: replay compilations of methods which use JSR292 (easy cases) [v2] In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 12:35:41 GMT, Vladimir Ivanov wrote: >> Good question. It is because of the section in ciInstanceKlass::dump_replay_data that dumps subclasses. If one of the CI classes is java.lang.Object, we can get a lot of hidden classes dumped there from startup that are unrelated to the current compile. I wanted to see how many I could find as a proof of concept / stress test. My plan is to see if we can completely do without subclass dumping there by dumping better CHA information (JDK-8261192). > > I still miss the connection between `ciInstanceKlass::dump_replay_data()` and `ciEnv::find_dynamic_call_sites()` cases. > > `ciEnv::find_dynamic_call_sites()` dumps all invokedynamic and invokehandle (MH.invoke*()) call sites and MethodHandle CP Constants across the class hierarchy. Any particular benefit compared to just dumping that info on per ciInstanceKlass granularity? Also, one more question: why do you dump MethodHandle CP constants and invokehandle call sites? Is it to record the connection between MethodHandle instances and hidden classes behind LambdaForms they are implemented with? ------------- PR: https://git.openjdk.java.net/jdk/pull/5270 From thartmann at openjdk.java.net Thu Sep 2 12:53:33 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 2 Sep 2021 12:53:33 GMT Subject: RFR: 8271340: Crash PhaseIdealLoop::clone_outer_loop In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 14:46:15 GMT, Roland Westrelin wrote: > but that doesn't happen because do_transform() starts from Root by following inputs Would the following code I added to Valhalla help? https://github.com/openjdk/valhalla/commit/f43fafc7d81f3d34f6e971765dbadae41ae5c393#diff-f4a11a2c3f7d7342641c878277f778856ad329e4e09026e39d08afb03efd10a2R1958 test/hotspot/jtreg/compiler/c2/TestInfiniteLoopCCP.java line 72: > 70: thread.setDaemon(true); > 71: thread.start(); > 72: Thread.sleep(/*Utils.adjustTimeout(4000)*/4000); Why is the `Utils.adjustTimeout` uncommented? ------------- PR: https://git.openjdk.java.net/jdk/pull/5220 From roland at openjdk.java.net Thu Sep 2 13:03:31 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 2 Sep 2021 13:03:31 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 14:17:16 GMT, Vladimir Ivanov wrote: >> `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. >> It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). >> >> `Unsafe::copyMemory()` performs argument validation which limits inputs either >> to off-heap location (null + absolute address) or primitive on-heap array. >> >> The only cases when barriers are still needed are: >> * mixed accesses (`Object+offset`); >> * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). >> >> Testing: hs-tier1 - hs-tier6 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Remove memory barriers src/hotspot/share/opto/library_call.cpp line 4082: > 4080: > 4081: bool is_array = _gvn.type(addr)->isa_aryptr(); > 4082: return is_mixed || (in_heap && !is_array); Why is the is_array check needed? ------------- PR: https://git.openjdk.java.net/jdk/pull/5259 From thartmann at openjdk.java.net Thu Sep 2 13:08:32 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 2 Sep 2021 13:08:32 GMT Subject: RFR: 8271341: Opcode() != Op_If && Opcode() != Op_RangeCheck) || outcnt() == 2 assert failure with Test7179138_1.java In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 08:27:47 GMT, Roland Westrelin wrote: > The root cause of this bug is in PhaseStringOpts. In the middle of the > chain of calls that are optimized out, there's a diamond Region/If. On > most executions this diamond is optimized out by IGVN because once > PhaseStringOpts is over, all the Region's Phis are removed. But > because one input of the If/Bool/Cmp is set to top by PhaseStringOpts > when calls are removed, it sometimes happen that top propagates to the > If before the Region is optimized out. That causes control flow below > the If to become dead while it should still be reachable. > > The fix I propose is to have PhaseStringOpts removed the Region/If in > that case. Looks good. Did you verify that the JavaFuzzer found test that Christian attached to the bug triggers the same issue? src/hotspot/share/opto/stringopts.cpp line 283: > 281: Node* iff = n->in(1)->in(0); > 282: assert(iff->is_If(), "no if for the diamond"); > 283: Node* bol = iff->in(1);; Double `;` ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4944 From roland at openjdk.java.net Thu Sep 2 13:19:29 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 2 Sep 2021 13:19:29 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v2] In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 12:59:41 GMT, Roland Westrelin wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove memory barriers > > src/hotspot/share/opto/library_call.cpp line 4082: > >> 4080: >> 4081: bool is_array = _gvn.type(addr)->isa_aryptr(); >> 4082: return is_mixed || (in_heap && !is_array); > > Why is the is_array check needed? Tobias explained that one to me privately ------------- PR: https://git.openjdk.java.net/jdk/pull/5259 From roland at openjdk.java.net Thu Sep 2 13:25:39 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 2 Sep 2021 13:25:39 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 14:17:16 GMT, Vladimir Ivanov wrote: >> `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. >> It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). >> >> `Unsafe::copyMemory()` performs argument validation which limits inputs either >> to off-heap location (null + absolute address) or primitive on-heap array. >> >> The only cases when barriers are still needed are: >> * mixed accesses (`Object+offset`); >> * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). >> >> Testing: hs-tier1 - hs-tier6 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Remove memory barriers src/hotspot/share/opto/library_call.cpp line 4074: > 4072: > 4073: //----------------------has_wide_mem------------------------- > 4074: bool LibraryCallKit::has_wide_mem(Node* addr, Node* base) { Why do you need both arguments? Isn't addr sufficient? ------------- PR: https://git.openjdk.java.net/jdk/pull/5259 From vlivanov at openjdk.java.net Thu Sep 2 14:24:13 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 2 Sep 2021 14:24:13 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v2] In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 13:22:12 GMT, Roland Westrelin wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove memory barriers > > src/hotspot/share/opto/library_call.cpp line 4074: > >> 4072: >> 4073: //----------------------has_wide_mem------------------------- >> 4074: bool LibraryCallKit::has_wide_mem(Node* addr, Node* base) { > > Why do you need both arguments? Isn't addr sufficient? It mimics similar checks from `LibraryCallKit::inline_unsafe_access()`: if (_gvn.type(base)->isa_ptr() == TypePtr::NULL_PTR) { if (type != T_OBJECT) { decorators |= IN_NATIVE; // off-heap primitive access } else { set_map(old_map); set_sp(old_sp); return false; // off-heap oop accesses are not supported } } else { heap_base_oop = base; // on-heap or mixed access } // Can base be NULL? Otherwise, always on-heap access. bool can_access_non_heap = TypePtr::NULL_PTR->higher_equal(_gvn.type(base)); if (!can_access_non_heap) { decorators |= IN_HEAP; } ------------- PR: https://git.openjdk.java.net/jdk/pull/5259 From thartmann at openjdk.java.net Thu Sep 2 14:24:18 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 2 Sep 2021 14:24:18 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations In-Reply-To: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: On Thu, 26 Aug 2021 09:19:41 GMT, Yi Yang wrote: > Greetings. This patch adds the following identical equations for Add and Xor node, respectively, which probably drives further optimizations. > > > ~(x-1) => -x > ~x + 1 => -x > > > > Verified by generated opto assembly, maybe an IR verification test can be added later. > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 1 ------------------------ > {method} > - this oop: 0x00007fe29f003518 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test1' > - signature: '(I)I' > - max stack: 3 > - max locals: 1 > - size of params: 1 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe3082160d0: 0xa i2c: 0x00007fe2fd34c660 c2i: 0x00007fe2fd34c719 c2iUV: 0x00007fe2fd34c6e3 c2iNCI: 0x00007fe2fd34c756 > - compiled entry 0x00007fe2fd34c719 > - code size: 6 > - code start: 0x00007fe29f003508 > - code end (excl): 0x00007fe29f00350e > - method data: 0x00007fe29f0070f8 > - checked ex length: 0 > - linenumber start: 0x00007fe29f00350e > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 1 ----------------------- > # > # int ( int ) > # > #r018 rsi : parm 0: int > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movl RAX, RSI # spill > 00e negl RAX # int > 010 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 022 ret > > -------------------------------------------------------------------------------- > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 3 ------------------------ > {method} > - this oop: 0x00007fe29f003668 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test3' > - signature: '(J)J' > - max stack: 5 > - max locals: 2 > - size of params: 2 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe30829c8f0: 0xbe i2c: 0x00007fe2fd2d88e0 c2i: 0x00007fe2fd2d89c0 c2iUV: 0x00007fe2fd2d898a c2iNCI: 0x00007fe2fd2d89fd > - compiled entry 0x00007fe2fd2d89c0 > - code size: 8 > - code start: 0x00007fe29f003658 > - code end (excl): 0x00007fe29f003660 > - method data: 0x00007fe29f007408 > - checked ex length: 0 > - linenumber start: 0x00007fe29f003660 > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 3 ----------------------- > # > # long/half ( long, half ) > # > #r018 rsi:rsi : parm 0: long > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movq RAX, RSI # spill > 00f negq RAX # long > 012 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 024 ret > > -------------------------------------------------------------------------------- > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 2 ------------------------ > {method} > - this oop: 0x00007fe29f0035c0 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test2' > - signature: '(I)I' > - max stack: 3 > - max locals: 1 > - size of params: 1 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe3082160d0: 0xa i2c: 0x00007fe2fd34c660 c2i: 0x00007fe2fd34c719 c2iUV: 0x00007fe2fd34c6e3 c2iNCI: 0x00007fe2fd34c756 > - compiled entry 0x00007fe2fd34c719 > - code size: 6 > - code start: 0x00007fe29f0035b0 > - code end (excl): 0x00007fe29f0035b6 > - method data: 0x00007fe29f007280 > - checked ex length: 0 > - linenumber start: 0x00007fe29f0035b6 > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 2 ----------------------- > # > # int ( int ) > # > #r018 rsi : parm 0: int > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movl RAX, RSI # spill > 00e negl RAX # int > 010 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 022 ret > > -------------------------------------------------------------------------------- > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 4 ------------------------ > {method} > - this oop: 0x00007fe29f003710 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test4' > - signature: '(J)J' > - max stack: 5 > - max locals: 2 > - size of params: 2 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe30829c8f0: 0xbe i2c: 0x00007fe2fd2d88e0 c2i: 0x00007fe2fd2d89c0 c2iUV: 0x00007fe2fd2d898a c2iNCI: 0x00007fe2fd2d89fd > - compiled entry 0x00007fe2fd2d89c0 > - code size: 8 > - code start: 0x00007fe29f003700 > - code end (excl): 0x00007fe29f003708 > - method data: 0x00007fe29f0075a0 > - checked ex length: 0 > - linenumber start: 0x00007fe29f003708 > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 4 ----------------------- > # > # long/half ( long, half ) > # > #r018 rsi:rsi : parm 0: long > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movq RAX, RSI # spill > 00f negq RAX # long > 012 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 024 ret src/hotspot/share/opto/addnode.cpp line 1014: > 1012: return new SubINode(phase->makecon(TypeInt::ZERO), in1->in(1)); > 1013: } > 1014: } else if (op2 == Op_AddI && phase->type(in1) == TypeInt::MINUS_1) { Why do you need to check both inputs for constant -1? Shouldn't `AddNode::Ideal` canonicalize the inputs and ensure that constants are moved to the second input? https://github.com/openjdk/jdk/blob/599d07c0db9c85e4dae35d1c54a63407d32eaedd/src/hotspot/share/opto/addnode.hpp#L52-L54 test/hotspot/jtreg/compiler/c2/TestAddXorIdeal.java line 30: > 28: * @summary C2: Improve Add and Xor ideal optimizations > 29: * @library /test/lib > 30: * @run main/othervm -XX:-Inline -XX:-TieredCompilation -XX:TieredStopAtLevel=4 -XX:CompileCommand=compileonly,compiler.c2.TestAddXorIdeal::* compiler.c2.TestAddXorIdeal What about `-XX:CompileCommand=dontinline,compiler.c2.TestAddXorIdeal::test*` Instead of disabling all inlining and limiting compilation? test/hotspot/jtreg/compiler/c2/TestAddXorIdeal.java line 59: > 57: Asserts.assertTrue(test2(i - 7) == -(i - 7)); > 58: Asserts.assertTrue(test3(i + 100) == -(i + 100)); > 59: Asserts.assertTrue(test4(i - 1024) == -(i - 1024)); What about using random numbers for better coverage? ------------- PR: https://git.openjdk.java.net/jdk/pull/5266 From roland at openjdk.java.net Thu Sep 2 15:32:06 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 2 Sep 2021 15:32:06 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v2] In-Reply-To: References: Message-ID: <3ABqso504SRlO7IS3lMoxJrKRZB3jxRmAfIEZxMFBeA=.b7aab560-62fa-434a-9649-e2d859e95cbb@github.com> On Wed, 1 Sep 2021 14:17:16 GMT, Vladimir Ivanov wrote: >> `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. >> It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). >> >> `Unsafe::copyMemory()` performs argument validation which limits inputs either >> to off-heap location (null + absolute address) or primitive on-heap array. >> >> The only cases when barriers are still needed are: >> * mixed accesses (`Object+offset`); >> * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). >> >> Testing: hs-tier1 - hs-tier6 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Remove memory barriers Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5259 From kvn at openjdk.java.net Thu Sep 2 17:23:13 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 2 Sep 2021 17:23:13 GMT Subject: RFR: 8244675: assert(IncrementalInline || (_late_inlines.length() == 0 && !has_mh_late_inlines())) [v2] In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 11:47:07 GMT, Vladimir Ivanov wrote: > > I suggest to move test into compiler/inlining directory. > > > > The bug is in Vector API-specific code, so putting the test under compiler/vectorapi looks well-justified. Okay. ------------- PR: https://git.openjdk.java.net/jdk/pull/5249 From kvn at openjdk.java.net Thu Sep 2 17:30:04 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 2 Sep 2021 17:30:04 GMT Subject: RFR: 8271340: Crash PhaseIdealLoop::clone_outer_loop In-Reply-To: References: Message-ID: <0G1jMP-iSxz4nSuvJV2He_JDajheQmds-pgqeyPmOB4=.c5719b1d-5fa2-483c-b623-e15138a0ac12@github.com> On Mon, 23 Aug 2021 14:46:15 GMT, Roland Westrelin wrote: > The test has a counted loop and an infinite loop. The infinite loop is > reachable from multiple paths and its loop head is a Region with more > than 3 inputs. One of these paths is from the counted loop. When loop > opts run, a NeverBranch is added to the infinite loop that's removed > by NeverBranchNode::Ideal() next because in(0) of the NeverBranch is a > Region and not a Loop. > > When CCP runs, it finds the counted loop exit is never reached because > a test in the loop body that depends on a loop phi is never taken. As > a consequence nodes along the path from the counted loop end to the > infinite loop Region have type top. One of these nodes is a Call > node. PhaseCCP::do_transform() should then cause the path between the > counted loop and the infinite loop to optimize out but that doesn't > happen because do_transform() starts from Root by following inputs. > The dead path is only reachable from the infinite loop but the there's > no edge between Root and the infinite loop. > > IGVN next runs, processes the Call Node, finds it's dead, kills > everything around it which causes the OuterStripMinedLoopEnd to loose > a projection. That later triggers the crash. > > The fix I propose is to be more conservative in > NeverBranchNode::Ideal() and to check for a in(0) that's a Region. As > a consequence, at CCP time, the infinite loop is reachable from Root. > > This change also requires some adjustments to Shenandoah specific code > that makes assumptions about the shape of infinite loops. Good. But, please, address Tobias's question about test. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5220 From kvn at openjdk.java.net Thu Sep 2 17:30:07 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 2 Sep 2021 17:30:07 GMT Subject: RFR: 8271340: Crash PhaseIdealLoop::clone_outer_loop In-Reply-To: References: Message-ID: On Tue, 31 Aug 2021 15:40:36 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/cfgnode.cpp line 2720: >> >>> 2718: // Check for no longer being part of a loop >>> 2719: Node *NeverBranchNode::Ideal(PhaseGVN *phase, bool can_reshape) { >>> 2720: if (can_reshape && !in(0)->is_Region()) { >> >> I assume this is the actual fix. > > Yes, it is. The rest is shenandoah specific. okay ------------- PR: https://git.openjdk.java.net/jdk/pull/5220 From kvn at openjdk.java.net Thu Sep 2 17:37:14 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 2 Sep 2021 17:37:14 GMT Subject: RFR: JDK-8272574: C2: assert(false) failed: Bad graph detected in build_loop_late [v11] In-Reply-To: References: Message-ID: <_PaWq9nRQLNksRPRLAS7Ar7Z1wlfTZqWy23x4d-qwMI=.fed0fd2a-a289-4ae2-8d7a-cd2439a074f8@github.com> On Thu, 2 Sep 2021 04:16:03 GMT, ?? wrote: >> Current loop predication will promote nodes(with a dependency on a later control node) to the insertion point which dominates the later control node. >> >> In the following example, loopPrediction will promote node 434 to the outer loop(predicted insert point is right after node 424), and it depends on control node 207. But node 424 dominates node 207, which means after the promotion, the cloned nodes have a control dependency on a later control node, which leads to a bad graph. >> >> ![image](https://user-images.githubusercontent.com/25214855/129720970-ff65b8f4-8bef-401d-8590-54aca6de470e.png) >> >> ![image](https://user-images.githubusercontent.com/25214855/129721369-4c61222b-7305-4522-9a37-e3e6e2138aa9.png) > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Add more debug infomation Okay. Let me test latest version. ------------- PR: https://git.openjdk.java.net/jdk/pull/5142 From kvn at openjdk.java.net Thu Sep 2 20:41:28 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 2 Sep 2021 20:41:28 GMT Subject: RFR: 8272413: Incorrect num of element count calculation for vector cast In-Reply-To: References: Message-ID: <5SfVlZCoJ9VP_HQ6fxBB6ASXSoYz_6k4BOd62bF91bc=.6e378fb2-8607-4828-af91-3f04ccb8ef8d@github.com> On Wed, 18 Aug 2021 08:31:47 GMT, Wang Huang wrote: > Dear all, > Closed JDK-8265244 has split into two issues : JDK-8268966 and this issue. During this issue, I will fix the mid-end comparsion. > This patch is easy to understand. It is split from https://github.com/openjdk/jdk/pull/3507. I only fix the mid-end problem because the back-end problem has fixed in JDK-8268966 by @theRealELiu . > Thank you for your review. > > Yours, > WANG Huang I will run testing and let you know result. ------------- PR: https://git.openjdk.java.net/jdk/pull/5160 From kvn at openjdk.java.net Thu Sep 2 22:34:30 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 2 Sep 2021 22:34:30 GMT Subject: RFR: JDK-8272574: C2: assert(false) failed: Bad graph detected in build_loop_late [v11] In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 04:16:03 GMT, ?? wrote: >> Current loop predication will promote nodes(with a dependency on a later control node) to the insertion point which dominates the later control node. >> >> In the following example, loopPrediction will promote node 434 to the outer loop(predicted insert point is right after node 424), and it depends on control node 207. But node 424 dominates node 207, which means after the promotion, the cloned nodes have a control dependency on a later control node, which leads to a bad graph. >> >> ![image](https://user-images.githubusercontent.com/25214855/129720970-ff65b8f4-8bef-401d-8590-54aca6de470e.png) >> >> ![image](https://user-images.githubusercontent.com/25214855/129721369-4c61222b-7305-4522-9a37-e3e6e2138aa9.png) > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Add more debug infomation Testing passed clean. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5142 From dlong at openjdk.java.net Thu Sep 2 22:43:29 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 2 Sep 2021 22:43:29 GMT Subject: RFR: 8271911: replay compilations of methods which use JSR292 (easy cases) [v2] In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 12:38:42 GMT, Vladimir Ivanov wrote: >> I still miss the connection between `ciInstanceKlass::dump_replay_data()` and `ciEnv::find_dynamic_call_sites()` cases. >> >> `ciEnv::find_dynamic_call_sites()` dumps all invokedynamic and invokehandle (MH.invoke*()) call sites and MethodHandle CP Constants across the class hierarchy. Any particular benefit compared to just dumping that info on per ciInstanceKlass granularity? > > Also, one more question: why do you dump MethodHandle CP constants and invokehandle call sites? Is it to record the connection between MethodHandle instances and hidden classes behind LambdaForms they are implemented with? > I still miss the connection between `ciInstanceKlass::dump_replay_data()` and `ciEnv::find_dynamic_call_sites()` cases. > > `ciEnv::find_dynamic_call_sites()` dumps all invokedynamic and invokehandle (MH.invoke*()) call sites and MethodHandle CP Constants across the class hierarchy. Any particular benefit compared to just dumping that info on per ciInstanceKlass granularity? Actually, find_dynamic_call_sites doesn't do the dumping, it just builds a map, so we only dump call sites that for hidden classes that are referenced in the replay data, preserving existing behavior. The dumping of subclasses in ciInstanceKlass::dump_replay_data confused me too, so I asked Tom Rodriguez about it. It's a substitute for CHA info in the reply file. If we load sublcasses of all the ciInstanceKlass's, then hopefully CHA queries will give the same answer at replay time. Note that not all subclasses have a ciInstanceKlass in the metadata. Subclasses of java.lang.Object include lots of hidden classes not referenced directly in the ci metadata. ------------- PR: https://git.openjdk.java.net/jdk/pull/5270 From dlong at openjdk.java.net Thu Sep 2 22:49:27 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 2 Sep 2021 22:49:27 GMT Subject: RFR: 8271911: replay compilations of methods which use JSR292 (easy cases) [v2] In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 22:40:04 GMT, Dean Long wrote: >> Also, one more question: why do you dump MethodHandle CP constants and invokehandle call sites? Is it to record the connection between MethodHandle instances and hidden classes behind LambdaForms they are implemented with? > >> I still miss the connection between `ciInstanceKlass::dump_replay_data()` and `ciEnv::find_dynamic_call_sites()` cases. >> >> `ciEnv::find_dynamic_call_sites()` dumps all invokedynamic and invokehandle (MH.invoke*()) call sites and MethodHandle CP Constants across the class hierarchy. Any particular benefit compared to just dumping that info on per ciInstanceKlass granularity? > > Actually, find_dynamic_call_sites doesn't do the dumping, it just builds a map, so we only dump call sites that for hidden classes that are referenced in the replay data, preserving existing behavior. The dumping of subclasses in ciInstanceKlass::dump_replay_data confused me too, so I asked Tom Rodriguez about it. It's a substitute for CHA info in the reply file. If we load sublcasses of all the ciInstanceKlass's, then hopefully CHA queries will give the same answer at replay time. Note that not all subclasses have a ciInstanceKlass in the metadata. Subclasses of java.lang.Object include lots of hidden classes not referenced directly in the ci metadata. > Also, one more question: why do you dump MethodHandle CP constants and invokehandle call sites? Is it to record the connection between MethodHandle instances and hidden classes behind LambdaForms they are implemented with? Yes, if I understand your question correctly. When inlining through an invokedynamic to the target, there's often an invokehandle at the end, and the hidden class isn't always found as the adapter or appendix. There are other locations like BSM arguments, but these are loaded from the constant pool and I found that looking at MethodHandle CP constants handled that case. Note again that I don't dump all MethodHandle CP constants, just the ones needed for a hidden class referenced in the reply data. ------------- PR: https://git.openjdk.java.net/jdk/pull/5270 From kvn at openjdk.java.net Fri Sep 3 00:48:27 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 3 Sep 2021 00:48:27 GMT Subject: RFR: 8272413: Incorrect num of element count calculation for vector cast In-Reply-To: References: Message-ID: On Wed, 18 Aug 2021 08:31:47 GMT, Wang Huang wrote: > Dear all, > Closed JDK-8265244 has split into two issues : JDK-8268966 and this issue. During this issue, I will fix the mid-end comparsion. > This patch is easy to understand. It is split from https://github.com/openjdk/jdk/pull/3507. I only fix the mid-end problem because the back-end problem has fixed in JDK-8268966 by @theRealELiu . > Thank you for your review. > > Yours, > WANG Huang Testing passed. It included running `compiler/vectorapi/VectorCastShape*Test.java` tests on aarch64 and x64. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5160 From github.com+25214855+casparcwang at openjdk.java.net Fri Sep 3 01:06:40 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Fri, 3 Sep 2021 01:06:40 GMT Subject: RFR: JDK-8272574: C2: assert(false) failed: Bad graph detected in build_loop_late [v11] In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 22:31:20 GMT, Vladimir Kozlov wrote: > Testing passed clean. Thank you very much for the testing and review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5142 From njian at openjdk.java.net Fri Sep 3 02:08:31 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Fri, 3 Sep 2021 02:08:31 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v4] In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 08:57:47 GMT, Wang Huang wrote: >> * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. >> * It may be a solver of JDK-8269866, or part of it. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix codes src/hotspot/cpu/aarch64/aarch64_neon.ad line 274: > 272: %} > 273: > 274: instruct vcvt4Bto4I(vecX dst, vecD src) 4BtoX should be supported as the min_vector_size() for byte type is 4. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From njian at openjdk.java.net Fri Sep 3 03:26:31 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Fri, 3 Sep 2021 03:26:31 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v4] In-Reply-To: References: Message-ID: <4EAkLjaVxK9iObc6Y35Tfz4Fwu1aNqgAh1PhU3Y-dXU=.3f0158f0-07d0-4916-a151-364fa5e9fc97@github.com> On Thu, 19 Aug 2021 08:57:47 GMT, Wang Huang wrote: >> * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. >> * It may be a solver of JDK-8269866, or part of it. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix codes src/hotspot/cpu/aarch64/aarch64.ad line 2452: > 2450: return false; > 2451: } > 2452: break; Why do you remove others but keep this (4Sto4B not implemented)? src/hotspot/cpu/aarch64/aarch64_neon.ad line 187: > 185: format %{ " # reinterpret $dst,$src\t# S2X" %} > 186: ins_encode %{ > 187: // The upper bits of "src" are expected to have been initialized to zero. I think the comment should be: // The higher bits of the "dst" register must be cleared to zero. src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2910: > 2908: INSN(frintm, 0, 0b00, 0b01, 0b11001); > 2909: INSN(frintp, 0, 0b10, 0b01, 0b11000); > 2910: INSN(fcvtzv, 0, 0b10, 0b01, 0b11011); // converts each element in a vector from a floating-point value to a signed integer value, and Arm's name is fcvtzs Would using "fcvtzs" name directly looks easier to understand (align with the manual)? ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From eliu at openjdk.java.net Fri Sep 3 04:33:29 2021 From: eliu at openjdk.java.net (Eric Liu) Date: Fri, 3 Sep 2021 04:33:29 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v4] In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 08:57:47 GMT, Wang Huang wrote: >> * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. >> * It may be a solver of JDK-8269866, or part of it. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix codes I tested in my local, a `bad AD file` due to the missing rule of `VectorReinterpret 4B` o1666 VectorReinterpret === _ o1665 [[o1691 ]] #vectord[8]:{byte} --N: o1666 VectorReinterpret === _ o1665 [[o1691 ]] #vectord[8]:{byte} --N: o1665 VectorCastI2X === _ o1644 [[o1666 ]] #vectord[4]:{byte} VECD 100 vcvt4Ito4B --N: o1644 LoadVector === o448 o7 o1642 |o1719 [[o1665 ]] @int[int:>=0]:NotNull:exact+any *, idx=9; mismatched #vectorx[4]:{int} VECX 0 VECX # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/matcher.cpp:1681 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/jdk/jdk_src/src/hotspot/share/opto/matcher.cpp:1681), pid=32585, tid=32598 # assert(false) failed: bad AD file # # JRE version: OpenJDK Runtime Environment (18.0) (fastdebug build 18-internal+0-git-cd8783c08) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 18-internal+0-git-cd8783c08, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) # Problematic frame: # V [libjvm.so+0x12bb770] Matcher::Label_Root(Node const*, State*, Node*, Node*&)+0xa50 # # Core dump will be written. Default location: /tmp/core # # An error report file with more information is saved as: # /home/jdk/test/hs_err_pid32585.log [thread 32586 also had an error] # # Compiler replay data is saved as: # /home/jdk/test/replay_pid32585.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # ./run.sh: line 25: 32585 Aborted (core dumped) $java -ea --add-modules jdk.incubator.vector -XX:+UnlockDiagnosticVMOptions -XX:+TieredCompilation $1 It can be reproduced by casting `Int128 species` to `byte64 species`[1]. Hope these 2 test cases helpful to you. [1] https://gist.github.com/theRealELiu/e75f0a001d8eeda17e02616479319624 [2] https://gist.github.com/theRealELiu/083573141d2c953c09d3da931b3e7ee7 ------------- Changes requested by eliu (Author). PR: https://git.openjdk.java.net/jdk/pull/4839 From whuang at openjdk.java.net Fri Sep 3 06:57:27 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Fri, 3 Sep 2021 06:57:27 GMT Subject: Withdrawn: 8270830: Aarch64: Use stp to initialize object on C1 In-Reply-To: References: Message-ID: On Fri, 16 Jul 2021 11:14:07 GMT, Wang Huang wrote: > Here is a trivail patch on aarch64. > It is found that in C1_MacroAssembler::initialize_object which we can use` stp `to replace` str`. > ```c++ > @@ -250,13 +249,16 @@ void C1_MacroAssembler::initialize_object(Register obj, Register klass, Register > for (int i = -unroll; i < 0; i++) { > if (-i == remainder) > bind(entry_point); > - str(zr, Address(rscratch1, i * wordSize)); > + stp(zr, zr, Address(rscratch1, i * 2 * BytesPerWord)); This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/4808 From roland at openjdk.java.net Fri Sep 3 07:46:14 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 3 Sep 2021 07:46:14 GMT Subject: RFR: 8271341: Opcode() != Op_If && Opcode() != Op_RangeCheck) || outcnt() == 2 assert failure with Test7179138_1.java [v2] In-Reply-To: References: Message-ID: <9DZav8F6XqqyzpYMwZh7k7uWjaXvMRHvWkDgb1GDskM=.a8883ac9-4053-4fee-b3b8-1befbf09451f@github.com> On Tue, 31 Aug 2021 15:15:59 GMT, Vladimir Kozlov wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge with master >> - review >> - fix > > src/hotspot/share/opto/stringopts.cpp line 282: > >> 280: } else if (n->is_Region()) { >> 281: Node* iff = n->in(1)->in(0); >> 282: assert(iff->is_If(), "no if for the diamond"); > > I assume that the only Region node listed in _control list is diamond Region. > May be add assert to check that it is diamond region. Up to you. > I would like to have separate RFE to have `is_diamond_region()` - we seems check it in few places. I pushed an update. Is this what you had in mind? ------------- PR: https://git.openjdk.java.net/jdk/pull/4944 From roland at openjdk.java.net Fri Sep 3 07:46:07 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 3 Sep 2021 07:46:07 GMT Subject: RFR: 8271341: Opcode() != Op_If && Opcode() != Op_RangeCheck) || outcnt() == 2 assert failure with Test7179138_1.java [v2] In-Reply-To: References: Message-ID: > The root cause of this bug is in PhaseStringOpts. In the middle of the > chain of calls that are optimized out, there's a diamond Region/If. On > most executions this diamond is optimized out by IGVN because once > PhaseStringOpts is over, all the Region's Phis are removed. But > because one input of the If/Bool/Cmp is set to top by PhaseStringOpts > when calls are removed, it sometimes happen that top propagates to the > If before the Region is optimized out. That causes control flow below > the If to become dead while it should still be reachable. > > The fix I propose is to have PhaseStringOpts removed the Region/If in > that case. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge with master - review - fix ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4944/files - new: https://git.openjdk.java.net/jdk/pull/4944/files/9ddafed4..cb87bc2c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4944&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4944&range=00-01 Stats: 12318 lines in 487 files changed: 6934 ins; 2404 del; 2980 mod Patch: https://git.openjdk.java.net/jdk/pull/4944.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4944/head:pull/4944 PR: https://git.openjdk.java.net/jdk/pull/4944 From roland at openjdk.java.net Fri Sep 3 07:46:09 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 3 Sep 2021 07:46:09 GMT Subject: RFR: 8271341: Opcode() != Op_If && Opcode() != Op_RangeCheck) || outcnt() == 2 assert failure with Test7179138_1.java [v2] In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 13:05:39 GMT, Tobias Hartmann wrote: > Did you verify that the JavaFuzzer found test that Christian attached to the bug triggers the same issue? That fuzzer test doesn't fail with current master. I assume it's some other issue that was fixed in the meantime. > src/hotspot/share/opto/stringopts.cpp line 283: > >> 281: Node* iff = n->in(1)->in(0); >> 282: assert(iff->is_If(), "no if for the diamond"); >> 283: Node* bol = iff->in(1);; > > Double `;` fixed ------------- PR: https://git.openjdk.java.net/jdk/pull/4944 From roland at openjdk.java.net Fri Sep 3 08:06:07 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 3 Sep 2021 08:06:07 GMT Subject: RFR: 8271340: Crash PhaseIdealLoop::clone_outer_loop [v2] In-Reply-To: References: Message-ID: On Tue, 31 Aug 2021 15:39:56 GMT, Vladimir Kozlov wrote: > Please, place `TestInfiniteLoopCCP.java` test into `compiler/loopopts` fixed in the update change ------------- PR: https://git.openjdk.java.net/jdk/pull/5220 From roland at openjdk.java.net Fri Sep 3 08:06:06 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 3 Sep 2021 08:06:06 GMT Subject: RFR: 8271340: Crash PhaseIdealLoop::clone_outer_loop [v2] In-Reply-To: References: Message-ID: <7QgbteC6c5N9iTs8LQB2SJT5t9eModJH--oHCqFA4lc=.5762c07b-bc53-43d6-9e8b-59d95195ffcf@github.com> > The test has a counted loop and an infinite loop. The infinite loop is > reachable from multiple paths and its loop head is a Region with more > than 3 inputs. One of these paths is from the counted loop. When loop > opts run, a NeverBranch is added to the infinite loop that's removed > by NeverBranchNode::Ideal() next because in(0) of the NeverBranch is a > Region and not a Loop. > > When CCP runs, it finds the counted loop exit is never reached because > a test in the loop body that depends on a loop phi is never taken. As > a consequence nodes along the path from the counted loop end to the > infinite loop Region have type top. One of these nodes is a Call > node. PhaseCCP::do_transform() should then cause the path between the > counted loop and the infinite loop to optimize out but that doesn't > happen because do_transform() starts from Root by following inputs. > The dead path is only reachable from the infinite loop but the there's > no edge between Root and the infinite loop. > > IGVN next runs, processes the Call Node, finds it's dead, kills > everything around it which causes the OuterStripMinedLoopEnd to loose > a projection. That later triggers the crash. > > The fix I propose is to be more conservative in > NeverBranchNode::Ideal() and to check for a in(0) that's a Region. As > a consequence, at CCP time, the infinite loop is reachable from Root. > > This change also requires some adjustments to Shenandoah specific code > that makes assumptions about the shape of infinite loops. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8271340 - move test - fix & test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5220/files - new: https://git.openjdk.java.net/jdk/pull/5220/files/0ece249f..c87b3ee7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5220&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5220&range=00-01 Stats: 12316 lines in 487 files changed: 6933 ins; 2404 del; 2979 mod Patch: https://git.openjdk.java.net/jdk/pull/5220.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5220/head:pull/5220 PR: https://git.openjdk.java.net/jdk/pull/5220 From roland at openjdk.java.net Fri Sep 3 08:16:04 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 3 Sep 2021 08:16:04 GMT Subject: RFR: 8271340: Crash PhaseIdealLoop::clone_outer_loop [v3] In-Reply-To: References: Message-ID: > The test has a counted loop and an infinite loop. The infinite loop is > reachable from multiple paths and its loop head is a Region with more > than 3 inputs. One of these paths is from the counted loop. When loop > opts run, a NeverBranch is added to the infinite loop that's removed > by NeverBranchNode::Ideal() next because in(0) of the NeverBranch is a > Region and not a Loop. > > When CCP runs, it finds the counted loop exit is never reached because > a test in the loop body that depends on a loop phi is never taken. As > a consequence nodes along the path from the counted loop end to the > infinite loop Region have type top. One of these nodes is a Call > node. PhaseCCP::do_transform() should then cause the path between the > counted loop and the infinite loop to optimize out but that doesn't > happen because do_transform() starts from Root by following inputs. > The dead path is only reachable from the infinite loop but the there's > no edge between Root and the infinite loop. > > IGVN next runs, processes the Call Node, finds it's dead, kills > everything around it which causes the OuterStripMinedLoopEnd to loose > a projection. That later triggers the crash. > > The fix I propose is to be more conservative in > NeverBranchNode::Ideal() and to check for a in(0) that's a Region. As > a consequence, at CCP time, the infinite loop is reachable from Root. > > This change also requires some adjustments to Shenandoah specific code > that makes assumptions about the shape of infinite loops. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: test fix ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5220/files - new: https://git.openjdk.java.net/jdk/pull/5220/files/c87b3ee7..18e84645 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5220&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5220&range=01-02 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5220.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5220/head:pull/5220 PR: https://git.openjdk.java.net/jdk/pull/5220 From roland at openjdk.java.net Fri Sep 3 08:16:08 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 3 Sep 2021 08:16:08 GMT Subject: RFR: 8271340: Crash PhaseIdealLoop::clone_outer_loop [v3] In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 12:39:53 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> test fix > > test/hotspot/jtreg/compiler/c2/TestInfiniteLoopCCP.java line 72: > >> 70: thread.setDaemon(true); >> 71: thread.start(); >> 72: Thread.sleep(/*Utils.adjustTimeout(4000)*/4000); > > Why is the `Utils.adjustTimeout` uncommented? That was an accident. Fixed in updated change. ------------- PR: https://git.openjdk.java.net/jdk/pull/5220 From roland at openjdk.java.net Fri Sep 3 08:23:25 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 3 Sep 2021 08:23:25 GMT Subject: RFR: 8271340: Crash PhaseIdealLoop::clone_outer_loop [v3] In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 08:16:04 GMT, Roland Westrelin wrote: >> The test has a counted loop and an infinite loop. The infinite loop is >> reachable from multiple paths and its loop head is a Region with more >> than 3 inputs. One of these paths is from the counted loop. When loop >> opts run, a NeverBranch is added to the infinite loop that's removed >> by NeverBranchNode::Ideal() next because in(0) of the NeverBranch is a >> Region and not a Loop. >> >> When CCP runs, it finds the counted loop exit is never reached because >> a test in the loop body that depends on a loop phi is never taken. As >> a consequence nodes along the path from the counted loop end to the >> infinite loop Region have type top. One of these nodes is a Call >> node. PhaseCCP::do_transform() should then cause the path between the >> counted loop and the infinite loop to optimize out but that doesn't >> happen because do_transform() starts from Root by following inputs. >> The dead path is only reachable from the infinite loop but the there's >> no edge between Root and the infinite loop. >> >> IGVN next runs, processes the Call Node, finds it's dead, kills >> everything around it which causes the OuterStripMinedLoopEnd to loose >> a projection. That later triggers the crash. >> >> The fix I propose is to be more conservative in >> NeverBranchNode::Ideal() and to check for a in(0) that's a Region. As >> a consequence, at CCP time, the infinite loop is reachable from Root. >> >> This change also requires some adjustments to Shenandoah specific code >> that makes assumptions about the shape of infinite loops. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > test fix > Would the following code I added to Valhalla help? > > [openjdk/valhalla at f43fafc#diff-f4a11a2c3f7d7342641c878277f778856ad329e4e09026e39d08afb03efd10a2R1958](https://github.com/openjdk/valhalla/commit/f43fafc7d81f3d34f6e971765dbadae41ae5c393#diff-f4a11a2c3f7d7342641c878277f778856ad329e4e09026e39d08afb03efd10a2R1958) I'm not sure. How would it help in this particular case? In any case, it seems to me NeverBranchNode::Ideal() is simply wrong. As I understand, when loop opts hit an infinite loop it creates a NeverBranchNode but because the infinite loop is not properly registered in the loop tree, the loop's Region is not transformed into a Loop node. On the next pass of loop opts, if the NeverBranch is still in the infinite loop, a Loop is created for the infinite loop. But if NeverBranchNode::Ideal() runs in between then a new NeverBranch is added and maybe and only in some later loop opts pass a Loop created. So AFAICT, the behavior is inconsistent depending on whether NeverBranchNode::Ideal() runs or not which doesn't seem right. ------------- PR: https://git.openjdk.java.net/jdk/pull/5220 From thartmann at openjdk.java.net Fri Sep 3 08:23:24 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 3 Sep 2021 08:23:24 GMT Subject: RFR: 8271340: Crash PhaseIdealLoop::clone_outer_loop [v3] In-Reply-To: References: Message-ID: <7L4gmRJ4FoevHv8KZSg7DrfYluKuwjsQupwsIl6n0jo=.9bcc201d-d588-4f75-a91e-f8df31fa64cc@github.com> On Fri, 3 Sep 2021 08:16:04 GMT, Roland Westrelin wrote: >> The test has a counted loop and an infinite loop. The infinite loop is >> reachable from multiple paths and its loop head is a Region with more >> than 3 inputs. One of these paths is from the counted loop. When loop >> opts run, a NeverBranch is added to the infinite loop that's removed >> by NeverBranchNode::Ideal() next because in(0) of the NeverBranch is a >> Region and not a Loop. >> >> When CCP runs, it finds the counted loop exit is never reached because >> a test in the loop body that depends on a loop phi is never taken. As >> a consequence nodes along the path from the counted loop end to the >> infinite loop Region have type top. One of these nodes is a Call >> node. PhaseCCP::do_transform() should then cause the path between the >> counted loop and the infinite loop to optimize out but that doesn't >> happen because do_transform() starts from Root by following inputs. >> The dead path is only reachable from the infinite loop but the there's >> no edge between Root and the infinite loop. >> >> IGVN next runs, processes the Call Node, finds it's dead, kills >> everything around it which causes the OuterStripMinedLoopEnd to loose >> a projection. That later triggers the crash. >> >> The fix I propose is to be more conservative in >> NeverBranchNode::Ideal() and to check for a in(0) that's a Region. As >> a consequence, at CCP time, the infinite loop is reachable from Root. >> >> This change also requires some adjustments to Shenandoah specific code >> that makes assumptions about the shape of infinite loops. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > test fix Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5220 From thartmann at openjdk.java.net Fri Sep 3 08:26:28 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 3 Sep 2021 08:26:28 GMT Subject: RFR: 8271340: Crash PhaseIdealLoop::clone_outer_loop [v3] In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 08:19:32 GMT, Roland Westrelin wrote: > > Would the following code I added to Valhalla help? > > [openjdk/valhalla at f43fafc#diff-f4a11a2c3f7d7342641c878277f778856ad329e4e09026e39d08afb03efd10a2R1958](https://github.com/openjdk/valhalla/commit/f43fafc7d81f3d34f6e971765dbadae41ae5c393#diff-f4a11a2c3f7d7342641c878277f778856ad329e4e09026e39d08afb03efd10a2R1958) > > I'm not sure. How would it help in this particular case? I thought it might help by aggressively removing the parts not reachable from the bottom (but still reachable from root). > In any case, it seems to me NeverBranchNode::Ideal() is simply wrong. As I understand, when loop opts hit an infinite loop it creates a NeverBranchNode but because the infinite loop is not properly registered in the loop tree, the loop's Region is not transformed into a Loop node. On the next pass of loop opts, if the NeverBranch is still in the infinite loop, a Loop is created for the infinite loop. But if NeverBranchNode::Ideal() runs in between then a new NeverBranch is added and maybe and only in some later loop opts pass a Loop created. So AFAICT, the behavior is inconsistent depending on whether NeverBranchNode::Ideal() runs or not which doesn't seem right. Thanks for the details. That makes sense. ------------- PR: https://git.openjdk.java.net/jdk/pull/5220 From wuyan at openjdk.java.net Fri Sep 3 09:52:27 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Fri, 3 Sep 2021 09:52:27 GMT Subject: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v4] In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 08:51:38 GMT, Andrew Haley wrote: >> I don't think we want to keep two copies of the compareTo intrinsic. If there are no cases where the LDP version is worse than the original version then we should just delete the old one and replace it with this. > >> I don't think we want to keep two copies of the compareTo intrinsic. If there are no cases where the LDP version is worse than the original version then we should just delete the old one and replace it with this. > > I agree. The trouble is, what does "worse" mean? I'm looking at SDEN-1982442, Neoverse N2 errata, 2001293, and I see that LDP has to be slowed down on streaming workloads, which will affect this. (That's just an example: I'm making the point that implementations differ.) > > The trouble with this patch is that it (probably) makes things better for long strings, which are very rare. What we actually need to care about is performance for a large number of typical-sized strings, which are names, identifiers, passwords, and so on: about 10-30 characters. @theRealAph do you have any other questions about this patch? ------------- PR: https://git.openjdk.java.net/jdk/pull/4722 From vlivanov at openjdk.java.net Fri Sep 3 12:07:35 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 3 Sep 2021 12:07:35 GMT Subject: RFR: 8244675: assert(IncrementalInline || (_late_inlines.length() == 0 && !has_mh_late_inlines())) [v2] In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 16:45:18 GMT, Vladimir Ivanov wrote: >> Avoid populating late inline candidates list when post-parse inlining is disabled. >> >> Testing: hs-tier1 - hs-tier4 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Assert is too strong Thanks for the reviews, Dean and Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/5249 From vlivanov at openjdk.java.net Fri Sep 3 12:07:37 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 3 Sep 2021 12:07:37 GMT Subject: Integrated: 8244675: assert(IncrementalInline || (_late_inlines.length() == 0 && !has_mh_late_inlines())) In-Reply-To: References: Message-ID: On Wed, 25 Aug 2021 10:57:50 GMT, Vladimir Ivanov wrote: > Avoid populating late inline candidates list when post-parse inlining is disabled. > > Testing: hs-tier1 - hs-tier4 This pull request has now been integrated. Changeset: 28ba78e6 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/28ba78e64721529fd764a7c09a7142a96c245f05 Stats: 53 lines in 3 files changed: 51 ins; 0 del; 2 mod 8244675: assert(IncrementalInline || (_late_inlines.length() == 0 && !has_mh_late_inlines())) Reviewed-by: dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/5249 From vlivanov at openjdk.java.net Fri Sep 3 14:08:57 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 3 Sep 2021 14:08:57 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v3] In-Reply-To: References: Message-ID: > `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. > It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). > > `Unsafe::copyMemory()` performs argument validation which limits inputs either > to off-heap location (null + absolute address) or primitive on-heap array. > > The only cases when barriers are still needed are: > * mixed accesses (`Object+offset`); > * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). > > Testing: hs-tier1 - hs-tier6 Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: - Formatting - Handle bottom[] case ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5259/files - new: https://git.openjdk.java.net/jdk/pull/5259/files/aa9c083d..efb4e89b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5259&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5259&range=01-02 Stats: 94 lines in 2 files changed: 74 ins; 1 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/5259.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5259/head:pull/5259 PR: https://git.openjdk.java.net/jdk/pull/5259 From vlivanov at openjdk.java.net Fri Sep 3 14:09:01 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 3 Sep 2021 14:09:01 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 14:17:16 GMT, Vladimir Ivanov wrote: >> `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. >> It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). >> >> `Unsafe::copyMemory()` performs argument validation which limits inputs either >> to off-heap location (null + absolute address) or primitive on-heap array. >> >> The only cases when barriers are still needed are: >> * mixed accesses (`Object+offset`); >> * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). >> >> Testing: hs-tier1 - hs-tier6 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Remove memory barriers Thinking more about it, I spotted that `is_array` check is not enough: it doesn't cover `bottom[]` case which should be treated as having wide memory. Fixed in the latest version. ------------- PR: https://git.openjdk.java.net/jdk/pull/5259 From vlivanov at openjdk.java.net Fri Sep 3 14:12:27 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 3 Sep 2021 14:12:27 GMT Subject: RFR: 8271911: replay compilations of methods which use JSR292 (easy cases) [v2] In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 22:55:56 GMT, Dean Long wrote: >> There is a subset of the general problem that we should be able to solve by looking at invokedynamic/invokehandle call sites and MethodHandle constant pool entries. If a replay references a hidden class that is discoverable in one of those locations, then we can use the location as a replacement for the transient VM name. >> >> Examples of references to hidden class locations: >> >> @bci compiler/ciReplay/CiReplayBase$TestMain test (I)V 1 argL0 ; >> @bci compiler/ciReplay/CiReplayBase$TestMain main ([Ljava/lang/String;)V 0 form vmentry ; >> @cpi compiler/ciReplay/CiReplayBase$TestMain 56 form vmentry ; > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > rename LocPusher --> RecordLocation, add comments Thanks for the clarifications, Dean. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5270 From vlivanov at openjdk.java.net Fri Sep 3 14:24:57 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 3 Sep 2021 14:24:57 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v4] In-Reply-To: References: Message-ID: > `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. > It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). > > `Unsafe::copyMemory()` performs argument validation which limits inputs either > to off-heap location (null + absolute address) or primitive on-heap array. > > The only cases when barriers are still needed are: > * mixed accesses (`Object+offset`); > * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). > > Testing: hs-tier1 - hs-tier6 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Stress instruction scheduling ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5259/files - new: https://git.openjdk.java.net/jdk/pull/5259/files/efb4e89b..45cd826d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5259&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5259&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5259.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5259/head:pull/5259 PR: https://git.openjdk.java.net/jdk/pull/5259 From kvn at openjdk.java.net Fri Sep 3 15:41:28 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 3 Sep 2021 15:41:28 GMT Subject: RFR: 8271341: Opcode() != Op_If && Opcode() != Op_RangeCheck) || outcnt() == 2 assert failure with Test7179138_1.java [v2] In-Reply-To: <9DZav8F6XqqyzpYMwZh7k7uWjaXvMRHvWkDgb1GDskM=.a8883ac9-4053-4fee-b3b8-1befbf09451f@github.com> References: <9DZav8F6XqqyzpYMwZh7k7uWjaXvMRHvWkDgb1GDskM=.a8883ac9-4053-4fee-b3b8-1befbf09451f@github.com> Message-ID: On Fri, 3 Sep 2021 07:41:15 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/stringopts.cpp line 282: >> >>> 280: } else if (n->is_Region()) { >>> 281: Node* iff = n->in(1)->in(0); >>> 282: assert(iff->is_If(), "no if for the diamond"); >> >> I assume that the only Region node listed in _control list is diamond Region. >> May be add assert to check that it is diamond region. Up to you. >> I would like to have separate RFE to have `is_diamond_region()` - we seems check it in few places. > > I pushed an update. Is this what you had in mind? yes ------------- PR: https://git.openjdk.java.net/jdk/pull/4944 From kvn at openjdk.java.net Fri Sep 3 15:41:28 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 3 Sep 2021 15:41:28 GMT Subject: RFR: 8271341: Opcode() != Op_If && Opcode() != Op_RangeCheck) || outcnt() == 2 assert failure with Test7179138_1.java [v2] In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 07:46:07 GMT, Roland Westrelin wrote: >> The root cause of this bug is in PhaseStringOpts. In the middle of the >> chain of calls that are optimized out, there's a diamond Region/If. On >> most executions this diamond is optimized out by IGVN because once >> PhaseStringOpts is over, all the Region's Phis are removed. But >> because one input of the If/Bool/Cmp is set to top by PhaseStringOpts >> when calls are removed, it sometimes happen that top propagates to the >> If before the Region is optimized out. That causes control flow below >> the If to become dead while it should still be reachable. >> >> The fix I propose is to have PhaseStringOpts removed the Region/If in >> that case. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge with master > - review > - fix Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4944 From jiefu at openjdk.java.net Fri Sep 3 15:45:40 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 3 Sep 2021 15:45:40 GMT Subject: RFR: 8273332: [Vector API] C2 fails to check whether the rotate operation is directly supported by the target ISA after JDK-8271366 Message-ID: <_mqmvKLnxQ9pn7ZBjuVaSqsZ6mu4rHX799TInX--M9c=.192cfe91-bc94-491d-ae82-e44b2539fe07@github.com> Hi all, A lot of Vector API tests from Panama's vectorIntrinsics+mask branch crash [1]. This is because `arch_supports_vector()` fails to check whether the rotate operation is directly supported by the target ISA after JDK-8271366. It can be fixed by adding `Matcher::match_rule_supported_vector` check in `inline_vector_broadcast_int()` and `inline_vector_nary_operation()` to prevent generation of unsupported rotate operation IR patterns. Maybe it's hard to reproduce with the jdk-repo, but the bug still be there in theory. So it would be better to fix it in the jdk repo. Testing: - jdk/incubator/vector/ with `-ea -esa -Xcomp -XX:CompileThreshold=100`, the crash reported in [1] gets fixed with Panama's vectorIntrinsics+mask branch Thanks. Best regards, Jie [1] https://bugs.openjdk.java.net/browse/JDK-8273205 ------------- Commit messages: - 8273332: [Vector API] C2 fails to check whether the rotate operation is directly supported by the target ISA after JDK-8271366 Changes: https://git.openjdk.java.net/jdk/pull/5364/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5364&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273332 Stats: 10 lines in 1 file changed: 10 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5364.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5364/head:pull/5364 PR: https://git.openjdk.java.net/jdk/pull/5364 From shade at openjdk.java.net Fri Sep 3 15:55:57 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 3 Sep 2021 15:55:57 GMT Subject: RFR: 8273335: compiler/blackhole tests should not run with interpreter-only VMs Message-ID: This affects Zero, as it runs these tests expecting `CompilerCommand`s to work. Which are obviously missing since there are no compilers configured. Since [JDK-8255718](https://bugs.openjdk.java.net/browse/JDK-8255718), Zero knows it runs in interpreter-only mode, so we can just leverage that. Additional testing: - [x] `compiler/blackhole` tests are skipped with Zero - [x] `compiler/blackhole` tests still run with Server ------------- Commit messages: - Implement Changes: https://git.openjdk.java.net/jdk/pull/5365/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5365&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273335 Stats: 6 lines in 6 files changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5365.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5365/head:pull/5365 PR: https://git.openjdk.java.net/jdk/pull/5365 From kvn at openjdk.java.net Fri Sep 3 16:45:24 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 3 Sep 2021 16:45:24 GMT Subject: RFR: 8273335: compiler/blackhole tests should not run with interpreter-only VMs In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 15:45:20 GMT, Aleksey Shipilev wrote: > This affects Zero, as it runs these tests expecting `CompilerCommand`s to work. Which are obviously missing since there are no compilers configured. Since [JDK-8255718](https://bugs.openjdk.java.net/browse/JDK-8255718), Zero knows it runs in interpreter-only mode, so we can just leverage that. > > Additional testing: > - [x] `compiler/blackhole` tests are skipped with Zero > - [x] `compiler/blackhole` tests still run with Server trivial. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5365 From dlong at openjdk.java.net Fri Sep 3 23:26:48 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 3 Sep 2021 23:26:48 GMT Subject: Integrated: 8271911: replay compilations of methods which use JSR292 (easy cases) In-Reply-To: References: Message-ID: On Thu, 26 Aug 2021 23:11:44 GMT, Dean Long wrote: > There is a subset of the general problem that we should be able to solve by looking at invokedynamic/invokehandle call sites and MethodHandle constant pool entries. If a replay references a hidden class that is discoverable in one of those locations, then we can use the location as a replacement for the transient VM name. > > Examples of references to hidden class locations: > > @bci compiler/ciReplay/CiReplayBase$TestMain test (I)V 1 argL0 ; > @bci compiler/ciReplay/CiReplayBase$TestMain main ([Ljava/lang/String;)V 0 form vmentry ; > @cpi compiler/ciReplay/CiReplayBase$TestMain 56 form vmentry ; This pull request has now been integrated. Changeset: 14a3ac09 Author: Dean Long URL: https://git.openjdk.java.net/jdk/commit/14a3ac09fe504ea97d269b78872bef6021c976fd Stats: 745 lines in 11 files changed: 713 ins; 9 del; 23 mod 8271911: replay compilations of methods which use JSR292 (easy cases) 8012267: ciReplay: fails to resolve @SignaturePolymorphic methods in replay data 8012268: ciReplay: process_ciInstanceKlass: JVM_CONSTANT_MethodHandle not supported Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.java.net/jdk/pull/5270 From dlong at openjdk.java.net Fri Sep 3 23:41:48 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 3 Sep 2021 23:41:48 GMT Subject: RFR: 8273332: [Vector API] C2 fails to check whether the rotate operation is directly supported by the target ISA after JDK-8271366 In-Reply-To: <_mqmvKLnxQ9pn7ZBjuVaSqsZ6mu4rHX799TInX--M9c=.192cfe91-bc94-491d-ae82-e44b2539fe07@github.com> References: <_mqmvKLnxQ9pn7ZBjuVaSqsZ6mu4rHX799TInX--M9c=.192cfe91-bc94-491d-ae82-e44b2539fe07@github.com> Message-ID: On Fri, 3 Sep 2021 15:36:18 GMT, Jie Fu wrote: > Hi all, > > A lot of Vector API tests from Panama's vectorIntrinsics+mask branch crash [1]. > > This is because `arch_supports_vector()` fails to check whether the rotate operation is directly supported by the target ISA after JDK-8271366. > It can be fixed by adding `Matcher::match_rule_supported_vector` check in `inline_vector_broadcast_int()` and `inline_vector_nary_operation()` to prevent generation of unsupported rotate operation IR patterns. > > Maybe it's hard to reproduce with the jdk-repo, but the bug still be there in theory. > So it would be better to fix it in the jdk repo. > > Testing: > - jdk/incubator/vector/ with `-ea -esa -Xcomp -XX:CompileThreshold=100`, the crash reported in [1] gets fixed with Panama's vectorIntrinsics+mask branch > > Thanks. > Best regards, > Jie > > [1] https://bugs.openjdk.java.net/browse/JDK-8273205 If -ea -esa -Xcomp -XX:CompileThreshold=100 is necessary to reproduce, then shouldn't the tests be updated to test with those flags? ------------- PR: https://git.openjdk.java.net/jdk/pull/5364 From jiefu at openjdk.java.net Fri Sep 3 23:53:45 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Fri, 3 Sep 2021 23:53:45 GMT Subject: RFR: 8273332: [Vector API] C2 fails to check whether the rotate operation is directly supported by the target ISA after JDK-8271366 In-Reply-To: References: <_mqmvKLnxQ9pn7ZBjuVaSqsZ6mu4rHX799TInX--M9c=.192cfe91-bc94-491d-ae82-e44b2539fe07@github.com> Message-ID: <36TRSH-JMWisrA8O01LasCr6Xj_XPSBMml2X096WQlo=.cd3cbcfa-2c24-4947-adf6-705ad5d80ec4@github.com> On Fri, 3 Sep 2021 23:38:29 GMT, Dean Long wrote: > If -ea -esa -Xcomp -XX:CompileThreshold=100 is necessary to reproduce, then shouldn't the tests be updated to test with those flags? It's extremely slow when testing with `-ea -esa -Xcomp -XX:CompileThreshold=100`. So it seems not a good idea to add these flags directly in the tests. Instead, I would suggest adding them in the CI/CD config. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/5364 From github.com+40024232+sunny868 at openjdk.java.net Sat Sep 4 03:10:04 2021 From: github.com+40024232+sunny868 at openjdk.java.net (SUN Guoyun) Date: Sat, 4 Sep 2021 03:10:04 GMT Subject: RFR: 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() Message-ID: Hi all, When I implement a new instruct in adfile for match CMoveP with Cmp node,like this: match(Set dst (CMoveP (Binary cop (CmpP op1 zero)) (Binary dst zero))); this means right child of CmpP is immediate zero and right child of CmovP also is immediate zero, then an exception will occur:

#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x000000fff410fcc4, pid=11130, tid=11146
#
# JRE version: OpenJDK Runtime Environment (17.0) (build 17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137)
# Java VM: OpenJDK 64-Bit Server VM (17-internal+0-jenkins-slave-20210821140615-jdk-ls-a526852e137, compiled mode, compressed oops, compressed class ptrs, g1 gc, linux-loongarch64)
# Problematic frame:
# V [libjvm.so+0x21fcc4] cmovP_cmpP_zero_zeroNode::bottom_type() const+0x44
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
#
In this case, cmovP_ cmpP_ zero_ zeroNode only has three input nodes, so an exception is triggered. This is a patch to fix this problem. Please help review it Thanks, Sun Guoyun ------------- Commit messages: - 8273317: crash in cmovP_cmpP_zero_zeroNode::bottom_type() Changes: https://git.openjdk.java.net/jdk/pull/5369/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5369&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273317 Stats: 16 lines in 1 file changed: 2 ins; 0 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/5369.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5369/head:pull/5369 PR: https://git.openjdk.java.net/jdk/pull/5369 From aph at openjdk.java.net Sun Sep 5 13:26:48 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sun, 5 Sep 2021 13:26:48 GMT Subject: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6] In-Reply-To: References: <_XL6WybKwHeJ54kSQnN_q0_NgvR7ib9BFjNJ4HrkO_g=.f82e6cda-b31f-4eee-9185-3e52ebd6b54d@github.com> Message-ID: On Thu, 26 Aug 2021 09:26:24 GMT, Wu Yan wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4871: >> >>> 4869: // exit from large loop when less than 64 bytes left to read or we're about >>> 4870: // to prefetch memory behind array border >>> 4871: int largeLoopExitCondition = MAX(64, SoftwarePrefetchHintDistance)/(isLL ? 1 : 2); >> >> This breaks the Windows AArch64 build: >> >> >> Creating support/modules_libs/java.base/server/jvm.dll from 1051 file(s) >> d:\a\jdk\jdk\jdk\src\hotspot\cpu\aarch64\stubGenerator_aarch64.cpp(4871): error C3861: 'MAX': identifier not found >> make[3]: *** [lib/CompileJvm.gmk:143: /cygdrive/d/a/jdk/jdk/jdk/build/windows-aarch64/hotspot/variant-server/libjvm >> >> >> https://github.com/Wanghuang-Huawei/jdk/runs/3260986937 >> >> Should probably be left as `MAX2`. > > Thanks, I'll fix it. It's fine. I don't think it'll affect any real programs, so it's rather pointless. I don't know if that's any reason not to approve it. ------------- PR: https://git.openjdk.java.net/jdk/pull/4722 From jiefu at openjdk.java.net Sun Sep 5 14:11:52 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sun, 5 Sep 2021 14:11:52 GMT Subject: RFR: 8273332: [Vector API] C2 fails to check whether the rotate operation is directly supported by the target ISA after JDK-8271366 In-Reply-To: <_mqmvKLnxQ9pn7ZBjuVaSqsZ6mu4rHX799TInX--M9c=.192cfe91-bc94-491d-ae82-e44b2539fe07@github.com> References: <_mqmvKLnxQ9pn7ZBjuVaSqsZ6mu4rHX799TInX--M9c=.192cfe91-bc94-491d-ae82-e44b2539fe07@github.com> Message-ID: On Fri, 3 Sep 2021 15:36:18 GMT, Jie Fu wrote: > Hi all, > > A lot of Vector API tests from Panama's vectorIntrinsics+mask branch crash [1]. > > This is because `arch_supports_vector()` fails to check whether the rotate operation is directly supported by the target ISA after JDK-8271366. > It can be fixed by adding `Matcher::match_rule_supported_vector` check in `inline_vector_broadcast_int()` and `inline_vector_nary_operation()` to prevent generation of unsupported rotate operation IR patterns. > > Maybe it's hard to reproduce with the jdk-repo, but the bug still be there in theory. > So it would be better to fix it in the jdk repo. > > Testing: > - jdk/incubator/vector/ with `-ea -esa -Xcomp -XX:CompileThreshold=100`, the crash reported in [1] gets fixed with Panama's vectorIntrinsics+mask branch > > Thanks. > Best regards, > Jie > > [1] https://bugs.openjdk.java.net/browse/JDK-8273205 After more investigation, this bug won't reproduce in the jdk-repo and should be fixed in Panama's vectorIntrinsics+mask. Sorry for the noise. ------------- PR: https://git.openjdk.java.net/jdk/pull/5364 From jiefu at openjdk.java.net Sun Sep 5 14:11:52 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sun, 5 Sep 2021 14:11:52 GMT Subject: Withdrawn: 8273332: [Vector API] C2 fails to check whether the rotate operation is directly supported by the target ISA after JDK-8271366 In-Reply-To: <_mqmvKLnxQ9pn7ZBjuVaSqsZ6mu4rHX799TInX--M9c=.192cfe91-bc94-491d-ae82-e44b2539fe07@github.com> References: <_mqmvKLnxQ9pn7ZBjuVaSqsZ6mu4rHX799TInX--M9c=.192cfe91-bc94-491d-ae82-e44b2539fe07@github.com> Message-ID: On Fri, 3 Sep 2021 15:36:18 GMT, Jie Fu wrote: > Hi all, > > A lot of Vector API tests from Panama's vectorIntrinsics+mask branch crash [1]. > > This is because `arch_supports_vector()` fails to check whether the rotate operation is directly supported by the target ISA after JDK-8271366. > It can be fixed by adding `Matcher::match_rule_supported_vector` check in `inline_vector_broadcast_int()` and `inline_vector_nary_operation()` to prevent generation of unsupported rotate operation IR patterns. > > Maybe it's hard to reproduce with the jdk-repo, but the bug still be there in theory. > So it would be better to fix it in the jdk repo. > > Testing: > - jdk/incubator/vector/ with `-ea -esa -Xcomp -XX:CompileThreshold=100`, the crash reported in [1] gets fixed with Panama's vectorIntrinsics+mask branch > > Thanks. > Best regards, > Jie > > [1] https://bugs.openjdk.java.net/browse/JDK-8273205 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5364 From eliu at openjdk.java.net Mon Sep 6 04:27:46 2021 From: eliu at openjdk.java.net (Eric Liu) Date: Mon, 6 Sep 2021 04:27:46 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations In-Reply-To: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: <9IxTNWnMkcusxvqUHgdP9_6PhXtDi7BPAOdca3wQoeE=.363e4477-1322-4f06-ac43-e0a712034fd7@github.com> On Thu, 26 Aug 2021 09:19:41 GMT, Yi Yang wrote: > Greetings. This patch adds the following identical equations for Add and Xor node, respectively, which probably drives further optimizations. > > > ~(x-1) => -x > ~x + 1 => -x > > > > Verified by generated opto assembly, maybe an IR verification test can be added later. > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 1 ------------------------ > {method} > - this oop: 0x00007fe29f003518 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test1' > - signature: '(I)I' > - max stack: 3 > - max locals: 1 > - size of params: 1 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe3082160d0: 0xa i2c: 0x00007fe2fd34c660 c2i: 0x00007fe2fd34c719 c2iUV: 0x00007fe2fd34c6e3 c2iNCI: 0x00007fe2fd34c756 > - compiled entry 0x00007fe2fd34c719 > - code size: 6 > - code start: 0x00007fe29f003508 > - code end (excl): 0x00007fe29f00350e > - method data: 0x00007fe29f0070f8 > - checked ex length: 0 > - linenumber start: 0x00007fe29f00350e > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 1 ----------------------- > # > # int ( int ) > # > #r018 rsi : parm 0: int > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movl RAX, RSI # spill > 00e negl RAX # int > 010 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 022 ret > > -------------------------------------------------------------------------------- > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 3 ------------------------ > {method} > - this oop: 0x00007fe29f003668 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test3' > - signature: '(J)J' > - max stack: 5 > - max locals: 2 > - size of params: 2 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe30829c8f0: 0xbe i2c: 0x00007fe2fd2d88e0 c2i: 0x00007fe2fd2d89c0 c2iUV: 0x00007fe2fd2d898a c2iNCI: 0x00007fe2fd2d89fd > - compiled entry 0x00007fe2fd2d89c0 > - code size: 8 > - code start: 0x00007fe29f003658 > - code end (excl): 0x00007fe29f003660 > - method data: 0x00007fe29f007408 > - checked ex length: 0 > - linenumber start: 0x00007fe29f003660 > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 3 ----------------------- > # > # long/half ( long, half ) > # > #r018 rsi:rsi : parm 0: long > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movq RAX, RSI # spill > 00f negq RAX # long > 012 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 024 ret > > -------------------------------------------------------------------------------- > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 2 ------------------------ > {method} > - this oop: 0x00007fe29f0035c0 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test2' > - signature: '(I)I' > - max stack: 3 > - max locals: 1 > - size of params: 1 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe3082160d0: 0xa i2c: 0x00007fe2fd34c660 c2i: 0x00007fe2fd34c719 c2iUV: 0x00007fe2fd34c6e3 c2iNCI: 0x00007fe2fd34c756 > - compiled entry 0x00007fe2fd34c719 > - code size: 6 > - code start: 0x00007fe29f0035b0 > - code end (excl): 0x00007fe29f0035b6 > - method data: 0x00007fe29f007280 > - checked ex length: 0 > - linenumber start: 0x00007fe29f0035b6 > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 2 ----------------------- > # > # int ( int ) > # > #r018 rsi : parm 0: int > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movl RAX, RSI # spill > 00e negl RAX # int > 010 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 022 ret > > -------------------------------------------------------------------------------- > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 4 ------------------------ > {method} > - this oop: 0x00007fe29f003710 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test4' > - signature: '(J)J' > - max stack: 5 > - max locals: 2 > - size of params: 2 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe30829c8f0: 0xbe i2c: 0x00007fe2fd2d88e0 c2i: 0x00007fe2fd2d89c0 c2iUV: 0x00007fe2fd2d898a c2iNCI: 0x00007fe2fd2d89fd > - compiled entry 0x00007fe2fd2d89c0 > - code size: 8 > - code start: 0x00007fe29f003700 > - code end (excl): 0x00007fe29f003708 > - method data: 0x00007fe29f0075a0 > - checked ex length: 0 > - linenumber start: 0x00007fe29f003708 > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 4 ----------------------- > # > # long/half ( long, half ) > # > #r018 rsi:rsi : parm 0: long > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movq RAX, RSI # spill > 00f negq RAX # long > 012 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 024 ret src/hotspot/share/opto/addnode.cpp line 1011: > 1009: // be -1^(x+(-1)). > 1010: if (op1 == Op_AddI && phase->type(in2) == TypeInt::MINUS_1) { > 1011: if (phase->type(in1->in(2)) == TypeInt::MINUS_1) { These two conditions could be combined. ------------- PR: https://git.openjdk.java.net/jdk/pull/5266 From wuyan at openjdk.java.net Mon Sep 6 06:46:48 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Mon, 6 Sep 2021 06:46:48 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v4] In-Reply-To: <4EAkLjaVxK9iObc6Y35Tfz4Fwu1aNqgAh1PhU3Y-dXU=.3f0158f0-07d0-4916-a151-364fa5e9fc97@github.com> References: <4EAkLjaVxK9iObc6Y35Tfz4Fwu1aNqgAh1PhU3Y-dXU=.3f0158f0-07d0-4916-a151-364fa5e9fc97@github.com> Message-ID: On Fri, 3 Sep 2021 03:21:10 GMT, Ningsheng Jian wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix codes > > src/hotspot/cpu/aarch64/aarch64_neon.ad line 187: > >> 185: format %{ " # reinterpret $dst,$src\t# S2X" %} >> 186: ins_encode %{ >> 187: // The upper bits of "src" are expected to have been initialized to zero. > > I think the comment should be: > > // The higher bits of the "dst" register must be cleared to zero. Thanks, I'll fix it. > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2910: > >> 2908: INSN(frintm, 0, 0b00, 0b01, 0b11001); >> 2909: INSN(frintp, 0, 0b10, 0b01, 0b11000); >> 2910: INSN(fcvtzv, 0, 0b10, 0b01, 0b11011); // converts each element in a vector from a floating-point value to a signed integer value, and Arm's name is fcvtzs > > Would using "fcvtzs" name directly looks easier to understand (align with the manual)? "fcvtzs" has been used here: https://github.com/openjdk/jdk/blob/c640fe42c2b5e6668a2a875678be44443942c868/src/hotspot/cpu/aarch64/assembler_aarch64.hpp#L2030 ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From thartmann at openjdk.java.net Mon Sep 6 06:48:45 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 6 Sep 2021 06:48:45 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v4] In-Reply-To: References: Message-ID: <-CYL0Wy03zTprPxu9PRt5DV0-hQMbcUxWNSd8yUXESg=.f0de8223-fd34-4c4a-aa7c-81be8bbd9e35@github.com> On Fri, 3 Sep 2021 14:24:57 GMT, Vladimir Ivanov wrote: >> `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. >> It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). >> >> `Unsafe::copyMemory()` performs argument validation which limits inputs either >> to off-heap location (null + absolute address) or primitive on-heap array. >> >> The only cases when barriers are still needed are: >> * mixed accesses (`Object+offset`); >> * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). >> >> Testing: hs-tier1 - hs-tier6 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > Stress instruction scheduling src/hotspot/share/opto/library_call.cpp line 4082: > 4080: bool is_mixed = !in_heap && !in_native; > 4081: > 4082: bool is_prim_array = (addr_t != NULL) && (addr_t->elem() != Type::BOTTOM); What does `is_prim_array` stand for? I read this as "is primitive array" which does not make sense :) test/hotspot/jtreg/compiler/unsafe/UnsafeCopyMemory.java line 31: > 29: * > 30: * @run main/othervm -Xbatch -XX:CompileCommand=dontinline,compiler.unsafe.UnsafeCopyMemory::test* > 31: * -XX:+UnlockDiagnosticVMOptions -XX:+StressGCM -XX:+StressLCM You should add `@key stress` to the test. ------------- PR: https://git.openjdk.java.net/jdk/pull/5259 From shade at openjdk.java.net Mon Sep 6 07:10:45 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Sep 2021 07:10:45 GMT Subject: RFR: 8273335: compiler/blackhole tests should not run with interpreter-only VMs In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 15:45:20 GMT, Aleksey Shipilev wrote: > This affects Zero, as it runs these tests expecting `CompilerCommand`s to work. Which are obviously missing since there are no compilers configured. Since [JDK-8255718](https://bugs.openjdk.java.net/browse/JDK-8255718), Zero knows it runs in interpreter-only mode, so we can just leverage that. > > Additional testing: > - [x] `compiler/blackhole` tests are skipped with Zero > - [x] `compiler/blackhole` tests still run with Server Thank you! ------------- PR: https://git.openjdk.java.net/jdk/pull/5365 From shade at openjdk.java.net Mon Sep 6 08:14:19 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Sep 2021 08:14:19 GMT Subject: Integrated: 8273335: compiler/blackhole tests should not run with interpreter-only VMs In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 15:45:20 GMT, Aleksey Shipilev wrote: > This affects Zero, as it runs these tests expecting `CompilerCommand`s to work. Which are obviously missing since there are no compilers configured. Since [JDK-8255718](https://bugs.openjdk.java.net/browse/JDK-8255718), Zero knows it runs in interpreter-only mode, so we can just leverage that. > > Additional testing: > - [x] `compiler/blackhole` tests are skipped with Zero > - [x] `compiler/blackhole` tests still run with Server This pull request has now been integrated. Changeset: 4d25e6f6 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/4d25e6f6c7ee855771ab9c05ae85a9d92c866941 Stats: 6 lines in 6 files changed: 6 ins; 0 del; 0 mod 8273335: compiler/blackhole tests should not run with interpreter-only VMs Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5365 From shade at openjdk.java.net Mon Sep 6 10:52:07 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Sep 2021 10:52:07 GMT Subject: RFR: 8273376: Zero: Disable vtable/itableStub gtests Message-ID: Simple test bug fix. Additional testing: - [x] Linux x86_64 Server still runs the test - [x] Linux x86_64 Zero now skips the test ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/5377/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5377&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273376 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5377.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5377/head:pull/5377 PR: https://git.openjdk.java.net/jdk/pull/5377 From jiefu at openjdk.java.net Mon Sep 6 11:02:38 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 6 Sep 2021 11:02:38 GMT Subject: RFR: 8273376: Zero: Disable vtable/itableStub gtests In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 10:44:15 GMT, Aleksey Shipilev wrote: > Simple test bug fix. > > Additional testing: > - [x] Linux x86_64 Server still runs the test > - [x] Linux x86_64 Zero now skips the test Looks fine to me. The copyright year also needs to be updated. Thanks. ------------- Marked as reviewed by jiefu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5377 From shade at openjdk.java.net Mon Sep 6 11:17:14 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Sep 2021 11:17:14 GMT Subject: RFR: 8273376: Zero: Disable vtable/itableStub gtests [v2] In-Reply-To: References: Message-ID: > Simple test bug fix. > > Additional testing: > - [x] Linux x86_64 Server still runs the test > - [x] Linux x86_64 Zero now skips the test Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Update copyright ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5377/files - new: https://git.openjdk.java.net/jdk/pull/5377/files/3e3ddf61..2357a3cf Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5377&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5377&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5377.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5377/head:pull/5377 PR: https://git.openjdk.java.net/jdk/pull/5377 From jiefu at openjdk.java.net Mon Sep 6 11:22:37 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 6 Sep 2021 11:22:37 GMT Subject: RFR: 8273376: Zero: Disable vtable/itableStub gtests [v2] In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 11:17:14 GMT, Aleksey Shipilev wrote: >> Simple test bug fix. >> >> Additional testing: >> - [x] Linux x86_64 Server still runs the test >> - [x] Linux x86_64 Zero now skips the test > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright Marked as reviewed by jiefu (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5377 From shade at openjdk.java.net Mon Sep 6 11:35:59 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Sep 2021 11:35:59 GMT Subject: RFR: 8273380: ARM32: Default to {ldrexd, strexd} in StubRoutines::atomic_{load|store}_long Message-ID: Current ARM32 is one of few remaining uses of `os::is_MP`, the rest is removed by JDK-8188764. There are some interesting bugs in OS/libc that might give incorrect `os::is_MP` sometimes, e.g. in containers. Instead of risking it, we can default to {ldrexd,strexd} for ARMv7 (which always have them), and leave the `os::is_MP` path for ARMv6 (for which this is the only remaining way to load the 64-bit long). @mychris, you might want to take a look and do light performance testing for it? Additional testing: - [x] Linux ARM32 cross-compiled build completes ------------- Commit messages: - First attempt Changes: https://git.openjdk.java.net/jdk/pull/5379/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5379&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273380 Stats: 18 lines in 1 file changed: 10 ins; 6 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5379.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5379/head:pull/5379 PR: https://git.openjdk.java.net/jdk/pull/5379 From stuefe at openjdk.java.net Mon Sep 6 11:38:40 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 6 Sep 2021 11:38:40 GMT Subject: RFR: 8273376: Zero: Disable vtable/itableStub gtests [v2] In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 11:17:14 GMT, Aleksey Shipilev wrote: >> Simple test bug fix. >> >> Additional testing: >> - [x] Linux x86_64 Server still runs the test >> - [x] Linux x86_64 Zero now skips the test > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright Good and trivial ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5377 From vlivanov at openjdk.java.net Mon Sep 6 14:20:22 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 6 Sep 2021 14:20:22 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v5] In-Reply-To: References: Message-ID: > `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. > It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). > > `Unsafe::copyMemory()` performs argument validation which limits inputs either > to off-heap location (null + absolute address) or primitive on-heap array. > > The only cases when barriers are still needed are: > * mixed accesses (`Object+offset`); > * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). > > Testing: hs-tier1 - hs-tier6 Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: - Refactor has_wide_mem - Update the test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5259/files - new: https://git.openjdk.java.net/jdk/pull/5259/files/45cd826d..6b93db70 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5259&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5259&range=03-04 Stats: 27 lines in 3 files changed: 13 ins; 4 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/5259.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5259/head:pull/5259 PR: https://git.openjdk.java.net/jdk/pull/5259 From vlivanov at openjdk.java.net Mon Sep 6 14:20:27 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 6 Sep 2021 14:20:27 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v4] In-Reply-To: <-CYL0Wy03zTprPxu9PRt5DV0-hQMbcUxWNSd8yUXESg=.f0de8223-fd34-4c4a-aa7c-81be8bbd9e35@github.com> References: <-CYL0Wy03zTprPxu9PRt5DV0-hQMbcUxWNSd8yUXESg=.f0de8223-fd34-4c4a-aa7c-81be8bbd9e35@github.com> Message-ID: On Mon, 6 Sep 2021 06:45:15 GMT, Tobias Hartmann wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> Stress instruction scheduling > > src/hotspot/share/opto/library_call.cpp line 4082: > >> 4080: bool is_mixed = !in_heap && !in_native; >> 4081: >> 4082: bool is_prim_array = (addr_t != NULL) && (addr_t->elem() != Type::BOTTOM); > > What does `is_prim_array` stand for? I read this as "is primitive array" which does not make sense :) Yes, it stands for "primitive array". Unsafe.copyMemory() validates inputs using Unsafe.checkPrimitiveArray() to ensure they are primitive arrays for on-heap accesses. I refactored `has_wide_mem` a bit to make it a bit clearer what's going on. > test/hotspot/jtreg/compiler/unsafe/UnsafeCopyMemory.java line 31: > >> 29: * >> 30: * @run main/othervm -Xbatch -XX:CompileCommand=dontinline,compiler.unsafe.UnsafeCopyMemory::test* >> 31: * -XX:+UnlockDiagnosticVMOptions -XX:+StressGCM -XX:+StressLCM > > You should add `@key stress` to the test. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5259 From ngasson at openjdk.java.net Tue Sep 7 01:40:41 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Tue, 7 Sep 2021 01:40:41 GMT Subject: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v7] In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 06:26:01 GMT, Wang Huang wrote: >> Dear all, >> Can you do me a favor to review this patch. This patch use `ldp` to implement String.compareTo. >> >> * We add a JMH test case >> * Here is the result of this test case >> >> Benchmark |(size)| Mode| Cnt|Score | Error |Units >> ---------------------------------|------|-----|----|------|--------|----- >> StringCompare.compareLL | 64 | avgt| 5 |7.992 | ? 0.005|us/op >> StringCompare.compareLL | 72 | avgt| 5 |15.029| ? 0.006|us/op >> StringCompare.compareLL | 80 | avgt| 5 |14.655| ? 0.011|us/op >> StringCompare.compareLL | 91 | avgt| 5 |16.363| ? 0.12 |us/op >> StringCompare.compareLL | 101 | avgt| 5 |16.966| ? 0.007|us/op >> StringCompare.compareLL | 121 | avgt| 5 |19.276| ? 0.006|us/op >> StringCompare.compareLL | 181 | avgt| 5 |19.002| ? 0.417|us/op >> StringCompare.compareLL | 256 | avgt| 5 |24.707| ? 0.041|us/op >> StringCompare.compareLLWithLdp| 64 | avgt| 5 |8.001 | ? 0.121|us/op >> StringCompare.compareLLWithLdp| 72 | avgt| 5 |11.573| ? 0.003|us/op >> StringCompare.compareLLWithLdp| 80 | avgt| 5 |6.861 | ? 0.004|us/op >> StringCompare.compareLLWithLdp| 91 | avgt| 5 |12.774| ? 0.201|us/op >> StringCompare.compareLLWithLdp| 101 | avgt| 5 |8.691 | ? 0.004|us/op >> StringCompare.compareLLWithLdp| 121 | avgt| 5 |11.091| ? 1.342|us/op >> StringCompare.compareLLWithLdp| 181 | avgt| 5 |14.64 | ? 0.581|us/op >> StringCompare.compareLLWithLdp| 256 | avgt| 5 |25.879| ? 1.775|us/op >> StringCompare.compareUU | 64 | avgt| 5 |13.476| ? 0.01 |us/op >> StringCompare.compareUU | 72 | avgt| 5 |15.078| ? 0.006|us/op >> StringCompare.compareUU | 80 | avgt| 5 |23.512| ? 0.011|us/op >> StringCompare.compareUU | 91 | avgt| 5 |24.284| ? 0.008|us/op >> StringCompare.compareUU | 101 | avgt| 5 |20.707| ? 0.017|us/op >> StringCompare.compareUU | 121 | avgt| 5 |29.302| ? 0.011|us/op >> StringCompare.compareUU | 181 | avgt| 5 |39.31 | ? 0.016|us/op >> StringCompare.compareUU | 256 | avgt| 5 |54.592| ? 0.392|us/op >> StringCompare.compareUUWithLdp| 64 | avgt| 5 |16.389| ? 0.008|us/op >> StringCompare.compareUUWithLdp| 72 | avgt| 5 |10.71 | ? 0.158|us/op >> StringCompare.compareUUWithLdp| 80 | avgt| 5 |11.488| ? 0.024|us/op >> StringCompare.compareUUWithLdp| 91 | avgt| 5 |13.412| ? 0.006|us/op >> StringCompare.compareUUWithLdp| 101 | avgt| 5 |16.245| ? 0.434|us/op >> StringCompare.compareUUWithLdp| 121 | avgt| 5 |16.597| ? 0.016|us/op >> StringCompare.compareUUWithLdp| 181 | avgt| 5 |27.373| ? 0.017|us/op >> StringCompare.compareUUWithLdp| 256 | avgt| 5 |41.74 | ? 3.5 |us/op >> >> From this table, we can see that in most cases, our patch is better than old one. >> >> Thank you for your review. Any suggestions are welcome. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix windows build failed Please check the Windows tier1 failure: https://github.com/Wanghuang-Huawei/jdk/runs/3459332995 Seems unlikely that it's anything to do with this patch so you may just want to re-run it or merge from master. ------------- PR: https://git.openjdk.java.net/jdk/pull/4722 From github.com+89823579+faye-arm at openjdk.java.net Tue Sep 7 04:19:49 2021 From: github.com+89823579+faye-arm at openjdk.java.net (Faye Gao) Date: Tue, 7 Sep 2021 04:19:49 GMT Subject: RFR: 8272968: AArch64: Remove redundant matching rules for commutative ops Message-ID: Match rules for commutative operations mnegI/mnegL/smnegL might become redundant after function matchrule_clone_and_swap(), and hence can be reduced. In adlc part, while parsing the contents of an instruction definition, function instr_parse always do the check for commutative operations with subtree operands, create clones and swap operands by function matchrule_clone_and_swap. It means that another operand-swapped and partially symmetrical match rule should be generated automatically for these commutative operations. The pattern to construct mnegI, mnegL or smnegL consists of a subtraction with zero and then a multiplication. In function count_commutative_op, both mulI and mulL are recognized as commutative opcodes. Therefore, we need only one match rule to specify that a multipilication consists of a number and a subtraction with zero for these three instructions and the extra one can be deleted. Take mnegL as an example. Without my patch, four match rules will be created finally for instruction selection. Two of them are created by ad files: Match Rule 1: dst = MulL (SubL zero src1) src2 ===> dst = mnegl src1 src2 Match Rule 2: dst = MulL src1 (SubL zero src2) ===> dst = mnegl src1 src2 The other two are automatically generated by function matchrule_clone_and_swap based on the two rules above: Match Rule 3 (generated by match rule 1): dst = MulL src2 (SubL zero src1) ===> dst = mnegl src1 src2 Match Rule 4 (generated by match rule 2): dst = MulL (SubL zero src2) src1 ===> dst = mnegl src1 src2 As mnegl is commutative, Rule 3 is equivalent to Rule 2, and Rule 1 is equivalent to Rule 4. Also, if we only keep the original Match Rule 1, as showed above, Rule 3 will be generated automatically later. In this way, Rule 2 and Rule 4 are redundant and hence Rule 2 can be eliminated. With my patch, Rule 2 is removed and Rule 4 won't be generated as well. Only Rule 1 and 3 are kept in the final rule chain. In my local release build, as redundant code got removed, the size of libjvm.so decreased from 23.30 MB to 23.27 MB, with a reduction of 33.11 KB(around 0.14%). ------------- Commit messages: - 8272968: AArch64: Remove redundant matching rules for commutative ops Changes: https://git.openjdk.java.net/jdk/pull/5311/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5311&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272968 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5311.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5311/head:pull/5311 PR: https://git.openjdk.java.net/jdk/pull/5311 From thartmann at openjdk.java.net Tue Sep 7 05:57:40 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 7 Sep 2021 05:57:40 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v5] In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 14:20:22 GMT, Vladimir Ivanov wrote: >> `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. >> It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). >> >> `Unsafe::copyMemory()` performs argument validation which limits inputs either >> to off-heap location (null + absolute address) or primitive on-heap array. >> >> The only cases when barriers are still needed are: >> * mixed accesses (`Object+offset`); >> * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). >> >> Testing: hs-tier1 - hs-tier6 > > Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: > > - Refactor has_wide_mem > - Update the test Thanks for the details, looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5259 From roland at openjdk.java.net Tue Sep 7 08:11:37 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 7 Sep 2021 08:11:37 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v5] In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 14:20:22 GMT, Vladimir Ivanov wrote: >> `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. >> It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). >> >> `Unsafe::copyMemory()` performs argument validation which limits inputs either >> to off-heap location (null + absolute address) or primitive on-heap array. >> >> The only cases when barriers are still needed are: >> * mixed accesses (`Object+offset`); >> * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). >> >> Testing: hs-tier1 - hs-tier6 > > Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: > > - Refactor has_wide_mem > - Update the test Marked as reviewed by roland (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5259 From shade at openjdk.java.net Tue Sep 7 08:56:39 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 7 Sep 2021 08:56:39 GMT Subject: RFR: 8273376: Zero: Disable vtable/itableStub gtests [v2] In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 11:17:14 GMT, Aleksey Shipilev wrote: >> Simple test bug fix. >> >> Additional testing: >> - [x] Linux x86_64 Server still runs the test >> - [x] Linux x86_64 Zero now skips the test > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright Thanks for reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5377 From shade at openjdk.java.net Tue Sep 7 08:56:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 7 Sep 2021 08:56:40 GMT Subject: Integrated: 8273376: Zero: Disable vtable/itableStub gtests In-Reply-To: References: Message-ID: <09dqgx603q1uyhgdbWS_e9Ccb4lbbqvGvQCqQfjh9ok=.2e39b6d8-9361-452e-b4a1-1b14e2995283@github.com> On Mon, 6 Sep 2021 10:44:15 GMT, Aleksey Shipilev wrote: > Simple test bug fix. > > Additional testing: > - [x] Linux x86_64 Server still runs the test > - [x] Linux x86_64 Zero now skips the test This pull request has now been integrated. Changeset: a522d6b5 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/a522d6b53cd841b4bfe87eac5778c9e5cdf5e90f Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod 8273376: Zero: Disable vtable/itableStub gtests Reviewed-by: jiefu, stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/5377 From wuyan at openjdk.java.net Tue Sep 7 09:42:38 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Tue, 7 Sep 2021 09:42:38 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v4] In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 04:30:43 GMT, Eric Liu wrote: > I tested in my local, a `bad AD file` due to the missing rule of `VectorReinterpret 4B` Thanks, I'll add the rule of `VectorReinterpret 4B` in next commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From wuyan at openjdk.java.net Tue Sep 7 09:42:39 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Tue, 7 Sep 2021 09:42:39 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v4] In-Reply-To: <4EAkLjaVxK9iObc6Y35Tfz4Fwu1aNqgAh1PhU3Y-dXU=.3f0158f0-07d0-4916-a151-364fa5e9fc97@github.com> References: <4EAkLjaVxK9iObc6Y35Tfz4Fwu1aNqgAh1PhU3Y-dXU=.3f0158f0-07d0-4916-a151-364fa5e9fc97@github.com> Message-ID: On Fri, 3 Sep 2021 03:16:23 GMT, Ningsheng Jian wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix codes > > src/hotspot/cpu/aarch64/aarch64.ad line 2452: > >> 2450: return false; >> 2451: } >> 2452: break; > > Why do you remove others but keep this (4Sto4B not implemented)? Because @theRealELiu suggest us not to have VectorCast with bit_size less than 64 bits support. In addition, we will add rules for the 4B vector, which will appear in actual situations. I'll refine the code in next commit. > src/hotspot/cpu/aarch64/aarch64_neon.ad line 274: > >> 272: %} >> 273: >> 274: instruct vcvt4Bto4I(vecX dst, vecD src) > > 4BtoX should be supported as the min_vector_size() for byte type is 4. OK, I'll add rules for 4B. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From wuyan at openjdk.java.net Tue Sep 7 09:42:40 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Tue, 7 Sep 2021 09:42:40 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v4] In-Reply-To: References: <4EAkLjaVxK9iObc6Y35Tfz4Fwu1aNqgAh1PhU3Y-dXU=.3f0158f0-07d0-4916-a151-364fa5e9fc97@github.com> Message-ID: On Mon, 6 Sep 2021 07:29:41 GMT, Ningsheng Jian wrote: >> "fcvtzs" has been used here: >> https://github.com/openjdk/jdk/blob/c640fe42c2b5e6668a2a875678be44443942c868/src/hotspot/cpu/aarch64/assembler_aarch64.hpp#L2030 > >> "fcvtzs" has been used here: >> https://github.com/openjdk/jdk/blob/c640fe42c2b5e6668a2a875678be44443942c868/src/hotspot/cpu/aarch64/assembler_aarch64.hpp#L2030 > > But that's the scalar version, and have different input arguments? OK, I'll fix it. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From eliu at openjdk.java.net Tue Sep 7 09:55:40 2021 From: eliu at openjdk.java.net (Eric Liu) Date: Tue, 7 Sep 2021 09:55:40 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v4] In-Reply-To: References: <4EAkLjaVxK9iObc6Y35Tfz4Fwu1aNqgAh1PhU3Y-dXU=.3f0158f0-07d0-4916-a151-364fa5e9fc97@github.com> Message-ID: On Tue, 7 Sep 2021 09:39:02 GMT, Wu Yan wrote: >> src/hotspot/cpu/aarch64/aarch64.ad line 2452: >> >>> 2450: return false; >>> 2451: } >>> 2452: break; >> >> Why do you remove others but keep this (4Sto4B not implemented)? > > Because @theRealELiu suggest us not to have VectorCast with bit_size less than 64 bits support. > In addition, we will add rules for the 4B vector, which will appear in actual situations. I'll refine the code in next commit. Byte is special here. Refs: [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/matcher.hpp#L341 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L2491 ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From whuang at openjdk.java.net Tue Sep 7 10:11:08 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Tue, 7 Sep 2021 10:11:08 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v5] In-Reply-To: References: Message-ID: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> > * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. > * It may be a solver of JDK-8269866, or part of it. Wang Huang has updated the pull request incrementally with one additional commit since the last revision: fix bugs ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4839/files - new: https://git.openjdk.java.net/jdk/pull/4839/files/13ba3995..4f0613cb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4839&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4839&range=03-04 Stats: 368 lines in 6 files changed: 173 ins; 81 del; 114 mod Patch: https://git.openjdk.java.net/jdk/pull/4839.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4839/head:pull/4839 PR: https://git.openjdk.java.net/jdk/pull/4839 From github.com+25214855+casparcwang at openjdk.java.net Tue Sep 7 11:09:40 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Tue, 7 Sep 2021 11:09:40 GMT Subject: RFR: JDK-8272574: C2: assert(false) failed: Bad graph detected in build_loop_late [v11] In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 04:16:03 GMT, ?? wrote: >> Current loop predication will promote nodes(with a dependency on a later control node) to the insertion point which dominates the later control node. >> >> In the following example, loopPrediction will promote node 434 to the outer loop(predicted insert point is right after node 424), and it depends on control node 207. But node 424 dominates node 207, which means after the promotion, the cloned nodes have a control dependency on a later control node, which leads to a bad graph. >> >> ![image](https://user-images.githubusercontent.com/25214855/129720970-ff65b8f4-8bef-401d-8590-54aca6de470e.png) >> >> ![image](https://user-images.githubusercontent.com/25214855/129721369-4c61222b-7305-4522-9a37-e3e6e2138aa9.png) > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Add more debug infomation > The number of required reviews for this PR is now set to 3 (with at least 1 of role reviewers). ping, this PR needs 3 reviewers. May I have more review of this PR? ------------- PR: https://git.openjdk.java.net/jdk/pull/5142 From vlivanov at openjdk.java.net Tue Sep 7 11:38:42 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 7 Sep 2021 11:38:42 GMT Subject: RFR: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic [v5] In-Reply-To: References: Message-ID: <5YUwTCGq_o1y5ehJ2pTQT8kTmpMv972RJBpTVeFG-2w=.7f30abee-f661-4884-ae8d-e2971427c7c8@github.com> On Mon, 6 Sep 2021 14:20:22 GMT, Vladimir Ivanov wrote: >> `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. >> It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). >> >> `Unsafe::copyMemory()` performs argument validation which limits inputs either >> to off-heap location (null + absolute address) or primitive on-heap array. >> >> The only cases when barriers are still needed are: >> * mixed accesses (`Object+offset`); >> * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). >> >> Testing: hs-tier1 - hs-tier6 > > Vladimir Ivanov has updated the pull request incrementally with two additional commits since the last revision: > > - Refactor has_wide_mem > - Update the test Thanks for the reviews, Tobias and Roland. ------------- PR: https://git.openjdk.java.net/jdk/pull/5259 From vlivanov at openjdk.java.net Tue Sep 7 11:38:44 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 7 Sep 2021 11:38:44 GMT Subject: Integrated: 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic In-Reply-To: References: Message-ID: On Wed, 25 Aug 2021 22:17:05 GMT, Vladimir Ivanov wrote: > `Unsafe::copyMemory0` intrinsic unconditionally inserts barriers arounds the call to `unsafe_arraycopy` stub. > It is a conservative approach and barriers can be avoided in most common cases (similar to what is done for scalar unsafe accesses). > > `Unsafe::copyMemory()` performs argument validation which limits inputs either > to off-heap location (null + absolute address) or primitive on-heap array. > > The only cases when barriers are still needed are: > * mixed accesses (`Object+offset`); > * mismatched access due to lack of type info on base oop (`Object:NotNull+offset`). > > Testing: hs-tier1 - hs-tier6 This pull request has now been integrated. Changeset: 377b1867 Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/377b186724473475480b834d99c38b8161bf6917 Stats: 499 lines in 2 files changed: 482 ins; 7 del; 10 mod 8269119: C2: Avoid redundant memory barriers in Unsafe.copyMemory0 intrinsic Reviewed-by: thartmann, roland ------------- PR: https://git.openjdk.java.net/jdk/pull/5259 From roland at openjdk.java.net Tue Sep 7 15:16:42 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 7 Sep 2021 15:16:42 GMT Subject: RFR: 8271341: Opcode() != Op_If && Opcode() != Op_RangeCheck) || outcnt() == 2 assert failure with Test7179138_1.java [v2] In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 15:38:12 GMT, Vladimir Kozlov wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge with master >> - review >> - fix > > Good. Thanks for the reviews @vnkozlov @TobiHartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/4944 From roland at openjdk.java.net Tue Sep 7 15:16:43 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 7 Sep 2021 15:16:43 GMT Subject: Integrated: 8271341: Opcode() != Op_If && Opcode() != Op_RangeCheck) || outcnt() == 2 assert failure with Test7179138_1.java In-Reply-To: References: Message-ID: On Fri, 30 Jul 2021 08:27:47 GMT, Roland Westrelin wrote: > The root cause of this bug is in PhaseStringOpts. In the middle of the > chain of calls that are optimized out, there's a diamond Region/If. On > most executions this diamond is optimized out by IGVN because once > PhaseStringOpts is over, all the Region's Phis are removed. But > because one input of the If/Bool/Cmp is set to top by PhaseStringOpts > when calls are removed, it sometimes happen that top propagates to the > If before the Region is optimized out. That causes control flow below > the If to become dead while it should still be reachable. > > The fix I propose is to have PhaseStringOpts removed the Region/If in > that case. This pull request has now been integrated. Changeset: 99fb12c7 Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/99fb12c798ad24cc4a671a666930ba42c3cd10c9 Stats: 19 lines in 2 files changed: 17 ins; 1 del; 1 mod 8271341: Opcode() != Op_If && Opcode() != Op_RangeCheck) || outcnt() == 2 assert failure with Test7179138_1.java Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/4944 From roland at openjdk.java.net Tue Sep 7 15:17:44 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 7 Sep 2021 15:17:44 GMT Subject: RFR: 8271340: Crash PhaseIdealLoop::clone_outer_loop [v3] In-Reply-To: <0G1jMP-iSxz4nSuvJV2He_JDajheQmds-pgqeyPmOB4=.c5719b1d-5fa2-483c-b623-e15138a0ac12@github.com> References: <0G1jMP-iSxz4nSuvJV2He_JDajheQmds-pgqeyPmOB4=.c5719b1d-5fa2-483c-b623-e15138a0ac12@github.com> Message-ID: On Thu, 2 Sep 2021 17:27:49 GMT, Vladimir Kozlov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> test fix > > Good. > But, please, address Tobias's question about test. Thanks for the reviews @vnkozlov @TobiHartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5220 From roland at openjdk.java.net Tue Sep 7 15:17:47 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 7 Sep 2021 15:17:47 GMT Subject: Integrated: 8271340: Crash PhaseIdealLoop::clone_outer_loop In-Reply-To: References: Message-ID: On Mon, 23 Aug 2021 14:46:15 GMT, Roland Westrelin wrote: > The test has a counted loop and an infinite loop. The infinite loop is > reachable from multiple paths and its loop head is a Region with more > than 3 inputs. One of these paths is from the counted loop. When loop > opts run, a NeverBranch is added to the infinite loop that's removed > by NeverBranchNode::Ideal() next because in(0) of the NeverBranch is a > Region and not a Loop. > > When CCP runs, it finds the counted loop exit is never reached because > a test in the loop body that depends on a loop phi is never taken. As > a consequence nodes along the path from the counted loop end to the > infinite loop Region have type top. One of these nodes is a Call > node. PhaseCCP::do_transform() should then cause the path between the > counted loop and the infinite loop to optimize out but that doesn't > happen because do_transform() starts from Root by following inputs. > The dead path is only reachable from the infinite loop but the there's > no edge between Root and the infinite loop. > > IGVN next runs, processes the Call Node, finds it's dead, kills > everything around it which causes the OuterStripMinedLoopEnd to loose > a projection. That later triggers the crash. > > The fix I propose is to be more conservative in > NeverBranchNode::Ideal() and to check for a in(0) that's a Region. As > a consequence, at CCP time, the infinite loop is reachable from Root. > > This change also requires some adjustments to Shenandoah specific code > that makes assumptions about the shape of infinite loops. This pull request has now been integrated. Changeset: 2abf3b3b Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/2abf3b3b2743947282300ee8416611559e49fca0 Stats: 183 lines in 4 files changed: 140 ins; 41 del; 2 mod 8271340: Crash PhaseIdealLoop::clone_outer_loop Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5220 From vlivanov at openjdk.java.net Tue Sep 7 17:58:55 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 7 Sep 2021 17:58:55 GMT Subject: RFR: 8273359: CI: ciInstanceKlass::get_canonical_holder() doesn't respect instance size Message-ID: `Compile::flatten_alias_type()` relies on `ciInstanceKlass::get_canonical_holder()` to canonicalise holder class. When a declared field is not found for a fixed offset (it can happen for unsafe accesses), the next thing `ciInstanceKlass::get_canonical_holder()` does it ascends class hierarchy looking for a most specific class without instance fields declared. But it completely ignores the instance size, so it can report a class as canonical while its size is smaller than the offset. It makes the address looks out-of-bounds which breaks idempotence property of address type flattening, because out-of-bounds field address types are flattened to `TypeOopPtr::BOTTOM`. Proposed fix stops the ascend when superclass size shrinks below `offset`. Testing: hs-tier1 - hs-tier4 ------------- Commit messages: - 8273359: assert(flat == flatten_alias_type(flat)) failed: not idempotent Changes: https://git.openjdk.java.net/jdk/pull/5395/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5395&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273359 Stats: 68 lines in 5 files changed: 61 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/5395.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5395/head:pull/5395 PR: https://git.openjdk.java.net/jdk/pull/5395 From kvn at openjdk.java.net Tue Sep 7 19:29:40 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 7 Sep 2021 19:29:40 GMT Subject: RFR: 8273359: CI: ciInstanceKlass::get_canonical_holder() doesn't respect instance size In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 17:37:40 GMT, Vladimir Ivanov wrote: > `Compile::flatten_alias_type()` relies on `ciInstanceKlass::get_canonical_holder()` to canonicalise holder class. > When a declared field is not found for a fixed offset (it can happen for unsafe accesses), the next thing `ciInstanceKlass::get_canonical_holder()` does it ascends class hierarchy looking for a most specific class without instance fields declared. But it completely ignores the instance size, so it can report a class as canonical while its size is smaller than the offset. It makes the address looks out-of-bounds which breaks idempotence property of address type flattening, because out-of-bounds field address types are flattened to `TypeOopPtr::BOTTOM`. > > Proposed fix stops the ascend when superclass size shrinks below `offset`. > > Testing: hs-tier1 - hs-tier4 Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5395 From zgu at openjdk.java.net Tue Sep 7 22:48:29 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 7 Sep 2021 22:48:29 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b Message-ID: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> The transformation reduce instructions in generated code. ### x86_64: Before: ``` 0x00007fb92c78b3ac: neg %esi 0x00007fb92c78b3ae: neg %edx 0x00007fb92c78b3b0: mov %esi,%eax 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} ; - TestSub::runSub at 4 (line 9) After: ; - TestSub::runSub at -1 (line 9) 0x00007fc8c05b74ac: mov %esi,%eax 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} ; - TestSub::runSub at 4 (line 9) ### AArch64: Before: 0x0000ffff814b4a70: neg w11, w1 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} ; - TestSub::runSub at 4 (line 9) After: 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} ; - TestSub::runSub at 4 (line 9) ------------- Commit messages: - Merge branch 'master' into JDK-8273454-neg-mul - v1 - v0 Changes: https://git.openjdk.java.net/jdk/pull/5403/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5403&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273454 Stats: 112 lines in 2 files changed: 112 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5403.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5403/head:pull/5403 PR: https://git.openjdk.java.net/jdk/pull/5403 From dcubed at openjdk.java.net Wed Sep 8 00:37:27 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 8 Sep 2021 00:37:27 GMT Subject: Integrated: 8273462: ProblemList vmTestbase/vm/mlvm/anonloader/stress/oome/heap/Test.java in -Xcomp mode Message-ID: A trivial fix to ProblemList vmTestbase/vm/mlvm/anonloader/stress/oome/heap/Test.java in -Xcomp mode. There are currently 17 sightings (so far) on linux-X64, macOS-X64, and win-X64. ------------- Commit messages: - 8273462: ProblemList vmTestbase/vm/mlvm/anonloader/stress/oome/heap/Test.java in -Xcomp mode Changes: https://git.openjdk.java.net/jdk/pull/5404/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5404&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273462 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5404.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5404/head:pull/5404 PR: https://git.openjdk.java.net/jdk/pull/5404 From dholmes at openjdk.java.net Wed Sep 8 00:37:27 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 8 Sep 2021 00:37:27 GMT Subject: Integrated: 8273462: ProblemList vmTestbase/vm/mlvm/anonloader/stress/oome/heap/Test.java in -Xcomp mode In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 00:25:37 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList vmTestbase/vm/mlvm/anonloader/stress/oome/heap/Test.java in -Xcomp mode. > > There are currently 17 sightings (so far) on linux-X64, macOS-X64, and win-X64. Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5404 From dcubed at openjdk.java.net Wed Sep 8 00:37:27 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 8 Sep 2021 00:37:27 GMT Subject: Integrated: 8273462: ProblemList vmTestbase/vm/mlvm/anonloader/stress/oome/heap/Test.java in -Xcomp mode In-Reply-To: References: Message-ID: <_zKNzeAcTOC1VeRlLLZsAQFv1APHBM5EvdTKd_CeNe8=.6335cfa1-8e2f-49f6-9cfa-9e6b93c2e9ff@github.com> On Wed, 8 Sep 2021 00:27:12 GMT, David Holmes wrote: >> A trivial fix to ProblemList vmTestbase/vm/mlvm/anonloader/stress/oome/heap/Test.java in -Xcomp mode. >> >> There are currently 17 sightings (so far) on linux-X64, macOS-X64, and win-X64. > > Marked as reviewed by dholmes (Reviewer). @dholmes-ora - Thanks for the lightning fast review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5404 From dcubed at openjdk.java.net Wed Sep 8 00:37:27 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 8 Sep 2021 00:37:27 GMT Subject: Integrated: 8273462: ProblemList vmTestbase/vm/mlvm/anonloader/stress/oome/heap/Test.java in -Xcomp mode In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 00:25:37 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList vmTestbase/vm/mlvm/anonloader/stress/oome/heap/Test.java in -Xcomp mode. > > There are currently 17 sightings (so far) on linux-X64, macOS-X64, and win-X64. This pull request has now been integrated. Changeset: 8884d2f8 Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/8884d2f854fafdc5f775fce557053d072e4a882c Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8273462: ProblemList vmTestbase/vm/mlvm/anonloader/stress/oome/heap/Test.java in -Xcomp mode Reviewed-by: dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/5404 From zgu at openjdk.java.net Wed Sep 8 02:09:38 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 8 Sep 2021 02:09:38 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v2] In-Reply-To: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: > The transformation reduce instructions in generated code. > > ### x86_64: > > Before: > ``` > 0x00007fb92c78b3ac: neg %esi > 0x00007fb92c78b3ae: neg %edx > 0x00007fb92c78b3b0: mov %esi,%eax > 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > After: > > ; - TestSub::runSub at -1 (line 9) > 0x00007fc8c05b74ac: mov %esi,%eax > 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > > > ### AArch64: > Before: > > 0x0000ffff814b4a70: neg w11, w1 > 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > > After: > > 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Fix test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5403/files - new: https://git.openjdk.java.net/jdk/pull/5403/files/b0761b1e..c55a3273 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5403&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5403&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/5403.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5403/head:pull/5403 PR: https://git.openjdk.java.net/jdk/pull/5403 From eliu at openjdk.java.net Wed Sep 8 04:26:04 2021 From: eliu at openjdk.java.net (Eric Liu) Date: Wed, 8 Sep 2021 04:26:04 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v2] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Wed, 8 Sep 2021 02:09:38 GMT, Zhengyu Gu wrote: >> The transformation reduce instructions in generated code. >> >> ### x86_64: >> >> Before: >> ``` >> 0x00007fb92c78b3ac: neg %esi >> 0x00007fb92c78b3ae: neg %edx >> 0x00007fb92c78b3b0: mov %esi,%eax >> 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> After: >> >> ; - TestSub::runSub at -1 (line 9) >> 0x00007fc8c05b74ac: mov %esi,%eax >> 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> >> ### AArch64: >> Before: >> >> 0x0000ffff814b4a70: neg w11, w1 >> 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> After: >> >> 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Fix test src/hotspot/share/opto/mulnode.cpp line 211: > 209: Node* n21 = in2->in(1); > 210: if (phase->type(n11)->higher_equal(TypeInt::ZERO) && > 211: phase->type(n21)->higher_equal(TypeInt::ZERO)) { I was thinking if it's a good idea to move these code into MulNode, as they were actually much the same with MulLNode. ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From yyang at openjdk.java.net Wed Sep 8 05:47:42 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 8 Sep 2021 05:47:42 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v2] In-Reply-To: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: > Greetings. This patch adds the following identical equations for Add and Xor node, respectively, which probably drives further optimizations. > > > ~(x-1) => -x > ~x + 1 => -x > > > > Verified by generated opto assembly, maybe an IR verification test can be added later. > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 1 ------------------------ > {method} > - this oop: 0x00007fe29f003518 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test1' > - signature: '(I)I' > - max stack: 3 > - max locals: 1 > - size of params: 1 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe3082160d0: 0xa i2c: 0x00007fe2fd34c660 c2i: 0x00007fe2fd34c719 c2iUV: 0x00007fe2fd34c6e3 c2iNCI: 0x00007fe2fd34c756 > - compiled entry 0x00007fe2fd34c719 > - code size: 6 > - code start: 0x00007fe29f003508 > - code end (excl): 0x00007fe29f00350e > - method data: 0x00007fe29f0070f8 > - checked ex length: 0 > - linenumber start: 0x00007fe29f00350e > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 1 ----------------------- > # > # int ( int ) > # > #r018 rsi : parm 0: int > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movl RAX, RSI # spill > 00e negl RAX # int > 010 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 022 ret > > -------------------------------------------------------------------------------- > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 3 ------------------------ > {method} > - this oop: 0x00007fe29f003668 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test3' > - signature: '(J)J' > - max stack: 5 > - max locals: 2 > - size of params: 2 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe30829c8f0: 0xbe i2c: 0x00007fe2fd2d88e0 c2i: 0x00007fe2fd2d89c0 c2iUV: 0x00007fe2fd2d898a c2iNCI: 0x00007fe2fd2d89fd > - compiled entry 0x00007fe2fd2d89c0 > - code size: 8 > - code start: 0x00007fe29f003658 > - code end (excl): 0x00007fe29f003660 > - method data: 0x00007fe29f007408 > - checked ex length: 0 > - linenumber start: 0x00007fe29f003660 > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 3 ----------------------- > # > # long/half ( long, half ) > # > #r018 rsi:rsi : parm 0: long > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movq RAX, RSI # spill > 00f negq RAX # long > 012 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 024 ret > > -------------------------------------------------------------------------------- > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 2 ------------------------ > {method} > - this oop: 0x00007fe29f0035c0 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test2' > - signature: '(I)I' > - max stack: 3 > - max locals: 1 > - size of params: 1 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe3082160d0: 0xa i2c: 0x00007fe2fd34c660 c2i: 0x00007fe2fd34c719 c2iUV: 0x00007fe2fd34c6e3 c2iNCI: 0x00007fe2fd34c756 > - compiled entry 0x00007fe2fd34c719 > - code size: 6 > - code start: 0x00007fe29f0035b0 > - code end (excl): 0x00007fe29f0035b6 > - method data: 0x00007fe29f007280 > - checked ex length: 0 > - linenumber start: 0x00007fe29f0035b6 > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 2 ----------------------- > # > # int ( int ) > # > #r018 rsi : parm 0: int > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movl RAX, RSI # spill > 00e negl RAX # int > 010 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 022 ret > > -------------------------------------------------------------------------------- > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 4 ------------------------ > {method} > - this oop: 0x00007fe29f003710 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test4' > - signature: '(J)J' > - max stack: 5 > - max locals: 2 > - size of params: 2 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe30829c8f0: 0xbe i2c: 0x00007fe2fd2d88e0 c2i: 0x00007fe2fd2d89c0 c2iUV: 0x00007fe2fd2d898a c2iNCI: 0x00007fe2fd2d89fd > - compiled entry 0x00007fe2fd2d89c0 > - code size: 8 > - code start: 0x00007fe29f003700 > - code end (excl): 0x00007fe29f003708 > - method data: 0x00007fe29f0075a0 > - checked ex length: 0 > - linenumber start: 0x00007fe29f003708 > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 4 ----------------------- > # > # long/half ( long, half ) > # > #r018 rsi:rsi : parm 0: long > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movq RAX, RSI # spill > 00f negq RAX # long > 012 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 024 ret Yi Yang has updated the pull request incrementally with one additional commit since the last revision: constatn is ensured in the second operand due to commute; combine conditional checks; random num in test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5266/files - new: https://git.openjdk.java.net/jdk/pull/5266/files/2f6ceb3d..ff3087af Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5266&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5266&range=00-01 Stats: 80 lines in 2 files changed: 30 ins; 30 del; 20 mod Patch: https://git.openjdk.java.net/jdk/pull/5266.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5266/head:pull/5266 PR: https://git.openjdk.java.net/jdk/pull/5266 From yyang at openjdk.java.net Wed Sep 8 05:56:42 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 8 Sep 2021 05:56:42 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v3] In-Reply-To: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: > Greetings. This patch adds the following identical equations for Add and Xor node, respectively, which probably drives further optimizations. > > > ~(x-1) => -x > ~x + 1 => -x > > > > Verified by generated opto assembly, maybe an IR verification test can be added later. > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 1 ------------------------ > {method} > - this oop: 0x00007fe29f003518 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test1' > - signature: '(I)I' > - max stack: 3 > - max locals: 1 > - size of params: 1 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe3082160d0: 0xa i2c: 0x00007fe2fd34c660 c2i: 0x00007fe2fd34c719 c2iUV: 0x00007fe2fd34c6e3 c2iNCI: 0x00007fe2fd34c756 > - compiled entry 0x00007fe2fd34c719 > - code size: 6 > - code start: 0x00007fe29f003508 > - code end (excl): 0x00007fe29f00350e > - method data: 0x00007fe29f0070f8 > - checked ex length: 0 > - linenumber start: 0x00007fe29f00350e > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 1 ----------------------- > # > # int ( int ) > # > #r018 rsi : parm 0: int > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movl RAX, RSI # spill > 00e negl RAX # int > 010 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 022 ret > > -------------------------------------------------------------------------------- > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 3 ------------------------ > {method} > - this oop: 0x00007fe29f003668 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test3' > - signature: '(J)J' > - max stack: 5 > - max locals: 2 > - size of params: 2 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe30829c8f0: 0xbe i2c: 0x00007fe2fd2d88e0 c2i: 0x00007fe2fd2d89c0 c2iUV: 0x00007fe2fd2d898a c2iNCI: 0x00007fe2fd2d89fd > - compiled entry 0x00007fe2fd2d89c0 > - code size: 8 > - code start: 0x00007fe29f003658 > - code end (excl): 0x00007fe29f003660 > - method data: 0x00007fe29f007408 > - checked ex length: 0 > - linenumber start: 0x00007fe29f003660 > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 3 ----------------------- > # > # long/half ( long, half ) > # > #r018 rsi:rsi : parm 0: long > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movq RAX, RSI # spill > 00f negq RAX # long > 012 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 024 ret > > -------------------------------------------------------------------------------- > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 2 ------------------------ > {method} > - this oop: 0x00007fe29f0035c0 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test2' > - signature: '(I)I' > - max stack: 3 > - max locals: 1 > - size of params: 1 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe3082160d0: 0xa i2c: 0x00007fe2fd34c660 c2i: 0x00007fe2fd34c719 c2iUV: 0x00007fe2fd34c6e3 c2iNCI: 0x00007fe2fd34c756 > - compiled entry 0x00007fe2fd34c719 > - code size: 6 > - code start: 0x00007fe29f0035b0 > - code end (excl): 0x00007fe29f0035b6 > - method data: 0x00007fe29f007280 > - checked ex length: 0 > - linenumber start: 0x00007fe29f0035b6 > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 2 ----------------------- > # > # int ( int ) > # > #r018 rsi : parm 0: int > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movl RAX, RSI # spill > 00e negl RAX # int > 010 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 022 ret > > -------------------------------------------------------------------------------- > > ============================= C2-compiled nmethod ============================== > ----------------------- MetaData before Compile_id = 4 ------------------------ > {method} > - this oop: 0x00007fe29f003710 > - method holder: 'compiler/c2/TestAddXorIdeal' > - constants: 0x00007fe29f003248 constant pool [38] {0x00007fe29f003248} for 'compiler/c2/TestAddXorIdeal' cache=0x00007fe29f0038f0 > - access: 0x81000009 public static > - name: 'test4' > - signature: '(J)J' > - max stack: 5 > - max locals: 2 > - size of params: 2 > - method size: 13 > - vtable index: -2 > - i2i entry: 0x00007fe2fd23fc00 > - adapters: AHE at 0x00007fe30829c8f0: 0xbe i2c: 0x00007fe2fd2d88e0 c2i: 0x00007fe2fd2d89c0 c2iUV: 0x00007fe2fd2d898a c2iNCI: 0x00007fe2fd2d89fd > - compiled entry 0x00007fe2fd2d89c0 > - code size: 8 > - code start: 0x00007fe29f003700 > - code end (excl): 0x00007fe29f003708 > - method data: 0x00007fe29f0075a0 > - checked ex length: 0 > - linenumber start: 0x00007fe29f003708 > - localvar length: 0 > > ------------------------ OptoAssembly for Compile_id = 4 ----------------------- > # > # long/half ( long, half ) > # > #r018 rsi:rsi : parm 0: long > # -- Old rsp -- Framesize: 32 -- > #r591 rsp+28: in_preserve > #r590 rsp+24: return address > #r589 rsp+20: in_preserve > #r588 rsp+16: saved fp register > #r587 rsp+12: pad2, stack alignment > #r586 rsp+ 8: pad2, stack alignment > #r585 rsp+ 4: Fixed slot 1 > #r584 rsp+ 0: Fixed slot 0 > # > 000 N1: # out( B1 ) <- in( B1 ) Freq: 1 > > 000 B1: # out( N1 ) <- BLOCK HEAD IS JUNK Freq: 1 > 000 # stack bang (96 bytes) > pushq rbp # Save rbp > subq rsp, #16 # Create frame > > 00c movq RAX, RSI # spill > 00f negq RAX # long > 012 addq rsp, 16 # Destroy frame > popq rbp > cmpq rsp, poll_offset[r15_thread] > ja #safepoint_stub # Safepoint: poll for GC > > 024 ret Yi Yang has updated the pull request incrementally with two additional commits since the last revision: - dontinline - more random ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5266/files - new: https://git.openjdk.java.net/jdk/pull/5266/files/ff3087af..e8a930b7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5266&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5266&range=01-02 Stats: 12 lines in 1 file changed: 8 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5266.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5266/head:pull/5266 PR: https://git.openjdk.java.net/jdk/pull/5266 From yyang at openjdk.java.net Wed Sep 8 05:56:47 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 8 Sep 2021 05:56:47 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v3] In-Reply-To: <9IxTNWnMkcusxvqUHgdP9_6PhXtDi7BPAOdca3wQoeE=.363e4477-1322-4f06-ac43-e0a712034fd7@github.com> References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> <9IxTNWnMkcusxvqUHgdP9_6PhXtDi7BPAOdca3wQoeE=.363e4477-1322-4f06-ac43-e0a712034fd7@github.com> Message-ID: On Mon, 6 Sep 2021 04:24:19 GMT, Eric Liu wrote: >> Yi Yang has updated the pull request incrementally with two additional commits since the last revision: >> >> - dontinline >> - more random > > src/hotspot/share/opto/addnode.cpp line 1011: > >> 1009: // be -1^(x+(-1)). >> 1010: if (op1 == Op_AddI && phase->type(in2) == TypeInt::MINUS_1) { >> 1011: if (phase->type(in1->in(2)) == TypeInt::MINUS_1) { > > These two conditions could be combined. Yes, I've combined these conditions. ------------- PR: https://git.openjdk.java.net/jdk/pull/5266 From yyang at openjdk.java.net Wed Sep 8 05:56:53 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Wed, 8 Sep 2021 05:56:53 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v3] In-Reply-To: References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: <8ghTqN0oVwSI_oxWyf_hRBG5_FoO37qAGCMTU03bMJM=.35cf0199-da23-4eec-b779-8f21cc52440b@github.com> On Thu, 2 Sep 2021 14:08:49 GMT, Tobias Hartmann wrote: >> Yi Yang has updated the pull request incrementally with two additional commits since the last revision: >> >> - dontinline >> - more random > > src/hotspot/share/opto/addnode.cpp line 1014: > >> 1012: return new SubINode(phase->makecon(TypeInt::ZERO), in1->in(1)); >> 1013: } >> 1014: } else if (op2 == Op_AddI && phase->type(in1) == TypeInt::MINUS_1) { > > Why do you need to check both inputs for constant -1? Shouldn't `AddNode::Ideal` canonicalize the inputs and ensure that constants are moved to the second input? > > https://github.com/openjdk/jdk/blob/599d07c0db9c85e4dae35d1c54a63407d32eaedd/src/hotspot/share/opto/addnode.hpp#L52-L54 Indeed, `commute` already moves loads and constants into right . Changed. > test/hotspot/jtreg/compiler/c2/TestAddXorIdeal.java line 30: > >> 28: * @summary C2: Improve Add and Xor ideal optimizations >> 29: * @library /test/lib >> 30: * @run main/othervm -XX:-Inline -XX:-TieredCompilation -XX:TieredStopAtLevel=4 -XX:CompileCommand=compileonly,compiler.c2.TestAddXorIdeal::* compiler.c2.TestAddXorIdeal > > What about `-XX:CompileCommand=dontinline,compiler.c2.TestAddXorIdeal::test*` Instead of disabling all inlining and limiting compilation? Changed. Magic number have been substituted by random number. ------------- PR: https://git.openjdk.java.net/jdk/pull/5266 From aph at openjdk.java.net Wed Sep 8 10:12:09 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 8 Sep 2021 10:12:09 GMT Subject: Integrated: 8270533: AArch64: size_fits_all_mem_uses should return false if its output is a CAS In-Reply-To: <5nBEpKu_kOod49K8Pe3comi_Bbyt0KLPmJD285Lv9bE=.d9fa7dc7-8b30-4b55-91d3-4d626cc7077c@github.com> References: <5nBEpKu_kOod49K8Pe3comi_Bbyt0KLPmJD285Lv9bE=.d9fa7dc7-8b30-4b55-91d3-4d626cc7077c@github.com> Message-ID: On Thu, 15 Jul 2021 09:31:49 GMT, Andrew Haley wrote: > `size_fits_all_mem_uses(AddPNode)` is used to determine if an add with shift expression can be used as an input to a MemNode. However, it is not correct when one of its outputs is a CAS. This causes a crash when a single AddPNode feeds a CAS and an ordinary MemNode. > > This bug is probably latent in the current HotSpot because `pd_clone_address_expressions()` clones AddPNodes that are used as inputs to MemNodes, so `size_fits_all_mem_uses()` never sees a CAS as one of its outputs. The matcher for a CompareAndSwapX node doesn't call `size_fits_all_mem_uses()`, so this can only happen when the same expression feeds both an ordinary MemNode and a CompareAndSwapX. > > This bug has been back-ported to JDK 8u, so it must be fixed there. Even though the bug is latent in current HotSpot, it should be fixed. This pull request has now been integrated. Changeset: 6750c34c Author: Andrew Haley URL: https://git.openjdk.java.net/jdk/commit/6750c34c92b5f28bba4a88ac798b800fce380d32 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod 8270533: AArch64: size_fits_all_mem_uses should return false if its output is a CAS Reviewed-by: adinn, ngasson ------------- PR: https://git.openjdk.java.net/jdk/pull/4790 From zgu at openjdk.java.net Wed Sep 8 12:36:07 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 8 Sep 2021 12:36:07 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v2] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Wed, 8 Sep 2021 04:23:19 GMT, Eric Liu wrote: >> Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test > > src/hotspot/share/opto/mulnode.cpp line 211: > >> 209: Node* n21 = in2->in(1); >> 210: if (phase->type(n11)->higher_equal(TypeInt::ZERO) && >> 211: phase->type(n21)->higher_equal(TypeInt::ZERO)) { > > I was thinking if it's a good idea to move these code into MulNode, as they were actually much the same with MulLNode. I wonder that too, so is the rest of MulINode/MulLNode::Ideal() code (and many other places). I am not sure how to workaround the different types, any suggestions? ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From github.com+85193727+jtfuller111 at openjdk.java.net Wed Sep 8 14:42:11 2021 From: github.com+85193727+jtfuller111 at openjdk.java.net (jtfuller111) Date: Wed, 8 Sep 2021 14:42:11 GMT Subject: Integrated: 8265443: IGV: disambiguate groups by emiting additional properties In-Reply-To: <1hUULnH2D3clrTvSru2UaInP_r5H4mCxv7aIiRMVds0=.8226b8ec-d7e0-419d-8a77-07d0134b7959@github.com> References: <1hUULnH2D3clrTvSru2UaInP_r5H4mCxv7aIiRMVds0=.8226b8ec-d7e0-419d-8a77-07d0134b7959@github.com> Message-ID: On Fri, 9 Jul 2021 14:20:54 GMT, jtfuller111 wrote: > Added compilation id and whether the compilation is OSR or not to the IdealGraphPrinter's output. Are there any other properties that should also be emitted? > > Here is a screenshot of how the properties appear in IGV: > ![image](https://user-images.githubusercontent.com/85193727/125358968-9fe19680-e337-11eb-8d00-eb744c4f3e30.png) This pull request has now been integrated. Changeset: f2f8136c Author: jtfuller111 Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/f2f8136cc9f5a3f554f704024748a09cb80bd037 Stats: 10 lines in 2 files changed: 10 ins; 0 del; 0 mod 8265443: IGV: disambiguate groups by emiting additional properties Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/4743 From whuang at openjdk.java.net Wed Sep 8 14:43:11 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Wed, 8 Sep 2021 14:43:11 GMT Subject: Integrated: 8272413: Incorrect num of element count calculation for vector cast In-Reply-To: References: Message-ID: On Wed, 18 Aug 2021 08:31:47 GMT, Wang Huang wrote: > Dear all, > Closed JDK-8265244 has split into two issues : JDK-8268966 and this issue. During this issue, I will fix the mid-end comparsion. > This patch is easy to understand. It is split from https://github.com/openjdk/jdk/pull/3507. I only fix the mid-end problem because the back-end problem has fixed in JDK-8268966 by @theRealELiu . > Thank you for your review. > > Yours, > WANG Huang This pull request has now been integrated. Changeset: 7e662e7b Author: Wang Huang Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/7e662e7b9d7ea5113f568418e0acac4234ebfb88 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod 8272413: Incorrect num of element count calculation for vector cast Co-authored-by: Wang Huang Co-authored-by: Miu Zhuojun Co-authored-by: Wu Yan Reviewed-by: eliu, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5160 From thartmann at openjdk.java.net Wed Sep 8 14:49:23 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 8 Sep 2021 14:49:23 GMT Subject: RFR: 8273409: Receiver type narrowed by CCP does not always trigger post-parse call devirtualization Message-ID: <01WVxWw0yNGT1ffo-xv4naivyaqIL-oZKSQJB9lwK-E=.2102ab41-53ac-4d6d-9828-7a20f6d09f13@github.com> While working on [JDK-8273323](https://bugs.openjdk.java.net/browse/JDK-8273323), I noticed that even if CCP is able to narrow the receiver type of a virtual call to an exact type, post-parse call devirtualization is not always triggered. The problem is that CCP only adds nodes to the IGVN worklist if their types have been improved: https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/phaseX.cpp#L1961-L1965 And in turn, IGVN only adds user nodes to the worklist if the type of the parent node could be further improved. As a result, a `CallNode` is not always added to the worklist if the type of its receiver node has been improved. Often, we are lucky and macro expansion or another optimization phase after CCP adds the `CallNode` to the worklist. However, as `testDynamicCallWithCCP` shows, that's not always the case. The fix is to explicitly add `CallStaticJavaNode` and `CallDynamicJava` to the worklist after CCP to give call devirtualization a chance to run: https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1054-L1057 https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1137-L1140 I also considered moving the optimization from `Ideal` to `Value` (which in contrast to `Ideal` is already executed **during** CCP) but we can't trust receiver types during CCP (because CCP goes from optimistic types to pessimistic ones). Also, the current implementation relies on setting the `_generator` field which is not possible from `Value` which is declared `const`: https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1169-L1172 All the newly added tests fail IR verification if post-parse call devirtualization is disabled (`-XX:-IncrementalInlineVirtual -XX:-IncrementalInlineMH`). Without the fix, `testDynamicCallWithCCP` always fails. I filed [JDK-8273496](https://bugs.openjdk.java.net/browse/JDK-8273496) to improve how CCP handles this. Thanks, Tobias ------------- Commit messages: - 8273409: Receiver type narrowed by CCP does not always trigger post-parse call devirtualization Changes: https://git.openjdk.java.net/jdk/pull/5418/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5418&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273409 Stats: 205 lines in 3 files changed: 204 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5418.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5418/head:pull/5418 PR: https://git.openjdk.java.net/jdk/pull/5418 From thartmann at openjdk.java.net Wed Sep 8 15:00:48 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 8 Sep 2021 15:00:48 GMT Subject: RFR: 8273409: Receiver type narrowed by CCP does not always trigger post-parse call devirtualization [v2] In-Reply-To: <01WVxWw0yNGT1ffo-xv4naivyaqIL-oZKSQJB9lwK-E=.2102ab41-53ac-4d6d-9828-7a20f6d09f13@github.com> References: <01WVxWw0yNGT1ffo-xv4naivyaqIL-oZKSQJB9lwK-E=.2102ab41-53ac-4d6d-9828-7a20f6d09f13@github.com> Message-ID: <8GJsFmEoa_X9E-PuKFi-ZkypxNJ8qfmfIocUuosnJG0=.31cc7320-c2d1-4501-b407-b71d1128d58b@github.com> > While working on [JDK-8273323](https://bugs.openjdk.java.net/browse/JDK-8273323), I noticed that even if CCP is able to narrow the receiver type of a virtual call to an exact type, post-parse call devirtualization is not always triggered. The problem is that CCP only adds nodes to the IGVN worklist if their types have been improved: > https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/phaseX.cpp#L1961-L1965 > > And in turn, IGVN only adds user nodes to the worklist if the type of the parent node could be further improved. As a result, a `CallNode` is not always added to the worklist if the type of its receiver node has been improved. Often, we are lucky and macro expansion or another optimization phase after CCP adds the `CallNode` to the worklist. However, as `testDynamicCallWithCCP` shows, that's not always the case. > > The fix is to explicitly add `CallStaticJavaNode` and `CallDynamicJava` to the worklist after CCP to give call devirtualization a chance to run: > https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1054-L1057 > https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1137-L1140 > > I also considered moving the optimization from `Ideal` to `Value` (which in contrast to `Ideal` is already executed **during** CCP) but we can't trust receiver types during CCP (because CCP goes from optimistic types to pessimistic ones). Also, the current implementation relies on setting the `_generator` field which is not possible from `Value` which is declared `const`: > https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1169-L1172 > > All the newly added tests fail IR verification if post-parse call devirtualization is disabled (`-XX:-IncrementalInlineVirtual -XX:-IncrementalInlineMH`). Without the fix, `testDynamicCallWithCCP` always fails. > > I filed [JDK-8273496](https://bugs.openjdk.java.net/browse/JDK-8273496) to improve how CCP handles this. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Added missing IgnoreUnrecognizedVMOptions ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5418/files - new: https://git.openjdk.java.net/jdk/pull/5418/files/577d7bee..b3709171 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5418&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5418&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5418.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5418/head:pull/5418 PR: https://git.openjdk.java.net/jdk/pull/5418 From vlivanov at openjdk.java.net Wed Sep 8 15:21:08 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 8 Sep 2021 15:21:08 GMT Subject: RFR: 8273409: Receiver type narrowed by CCP does not always trigger post-parse call devirtualization [v2] In-Reply-To: <8GJsFmEoa_X9E-PuKFi-ZkypxNJ8qfmfIocUuosnJG0=.31cc7320-c2d1-4501-b407-b71d1128d58b@github.com> References: <01WVxWw0yNGT1ffo-xv4naivyaqIL-oZKSQJB9lwK-E=.2102ab41-53ac-4d6d-9828-7a20f6d09f13@github.com> <8GJsFmEoa_X9E-PuKFi-ZkypxNJ8qfmfIocUuosnJG0=.31cc7320-c2d1-4501-b407-b71d1128d58b@github.com> Message-ID: On Wed, 8 Sep 2021 15:00:48 GMT, Tobias Hartmann wrote: >> While working on [JDK-8273323](https://bugs.openjdk.java.net/browse/JDK-8273323), I noticed that even if CCP is able to narrow the receiver type of a virtual call to an exact type, post-parse call devirtualization is not always triggered. The problem is that CCP only adds nodes to the IGVN worklist if their types have been improved: >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/phaseX.cpp#L1961-L1965 >> >> And in turn, IGVN only adds user nodes to the worklist if the type of the parent node could be further improved. As a result, a `CallNode` is not always added to the worklist if the type of its receiver node has been improved. Often, we are lucky and macro expansion or another optimization phase after CCP adds the `CallNode` to the worklist. However, as `testDynamicCallWithCCP` shows, that's not always the case. >> >> The fix is to explicitly add `CallStaticJavaNode` and `CallDynamicJava` to the worklist after CCP to give call devirtualization a chance to run: >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1054-L1057 >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1137-L1140 >> >> I also considered moving the optimization from `Ideal` to `Value` (which in contrast to `Ideal` is already executed **during** CCP) but we can't trust receiver types during CCP (because CCP goes from optimistic types to pessimistic ones). Also, the current implementation relies on setting the `_generator` field which is not possible from `Value` which is declared `const`: >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1169-L1172 >> >> All the newly added tests fail IR verification if post-parse call devirtualization is disabled (`-XX:-IncrementalInlineVirtual -XX:-IncrementalInlineMH`). Without the fix, `testDynamicCallWithCCP` always fails. >> >> I filed [JDK-8273496](https://bugs.openjdk.java.net/browse/JDK-8273496) to improve how CCP handles this. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Added missing IgnoreUnrecognizedVMOptions Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5418 From neliasso at openjdk.java.net Wed Sep 8 15:29:08 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Wed, 8 Sep 2021 15:29:08 GMT Subject: RFR: 8273409: Receiver type narrowed by CCP does not always trigger post-parse call devirtualization [v2] In-Reply-To: <8GJsFmEoa_X9E-PuKFi-ZkypxNJ8qfmfIocUuosnJG0=.31cc7320-c2d1-4501-b407-b71d1128d58b@github.com> References: <01WVxWw0yNGT1ffo-xv4naivyaqIL-oZKSQJB9lwK-E=.2102ab41-53ac-4d6d-9828-7a20f6d09f13@github.com> <8GJsFmEoa_X9E-PuKFi-ZkypxNJ8qfmfIocUuosnJG0=.31cc7320-c2d1-4501-b407-b71d1128d58b@github.com> Message-ID: On Wed, 8 Sep 2021 15:00:48 GMT, Tobias Hartmann wrote: >> While working on [JDK-8273323](https://bugs.openjdk.java.net/browse/JDK-8273323), I noticed that even if CCP is able to narrow the receiver type of a virtual call to an exact type, post-parse call devirtualization is not always triggered. The problem is that CCP only adds nodes to the IGVN worklist if their types have been improved: >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/phaseX.cpp#L1961-L1965 >> >> And in turn, IGVN only adds user nodes to the worklist if the type of the parent node could be further improved. As a result, a `CallNode` is not always added to the worklist if the type of its receiver node has been improved. Often, we are lucky and macro expansion or another optimization phase after CCP adds the `CallNode` to the worklist. However, as `testDynamicCallWithCCP` shows, that's not always the case. >> >> The fix is to explicitly add `CallStaticJavaNode` and `CallDynamicJava` to the worklist after CCP to give call devirtualization a chance to run: >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1054-L1057 >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1137-L1140 >> >> I also considered moving the optimization from `Ideal` to `Value` (which in contrast to `Ideal` is already executed **during** CCP) but we can't trust receiver types during CCP (because CCP goes from optimistic types to pessimistic ones). Also, the current implementation relies on setting the `_generator` field which is not possible from `Value` which is declared `const`: >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1169-L1172 >> >> All the newly added tests fail IR verification if post-parse call devirtualization is disabled (`-XX:-IncrementalInlineVirtual -XX:-IncrementalInlineMH`). Without the fix, `testDynamicCallWithCCP` always fails. >> >> I filed [JDK-8273496](https://bugs.openjdk.java.net/browse/JDK-8273496) to improve how CCP handles this. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Added missing IgnoreUnrecognizedVMOptions Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5418 From thartmann at openjdk.java.net Wed Sep 8 15:29:09 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 8 Sep 2021 15:29:09 GMT Subject: RFR: 8273409: Receiver type narrowed by CCP does not always trigger post-parse call devirtualization [v2] In-Reply-To: <8GJsFmEoa_X9E-PuKFi-ZkypxNJ8qfmfIocUuosnJG0=.31cc7320-c2d1-4501-b407-b71d1128d58b@github.com> References: <01WVxWw0yNGT1ffo-xv4naivyaqIL-oZKSQJB9lwK-E=.2102ab41-53ac-4d6d-9828-7a20f6d09f13@github.com> <8GJsFmEoa_X9E-PuKFi-ZkypxNJ8qfmfIocUuosnJG0=.31cc7320-c2d1-4501-b407-b71d1128d58b@github.com> Message-ID: On Wed, 8 Sep 2021 15:00:48 GMT, Tobias Hartmann wrote: >> While working on [JDK-8273323](https://bugs.openjdk.java.net/browse/JDK-8273323), I noticed that even if CCP is able to narrow the receiver type of a virtual call to an exact type, post-parse call devirtualization is not always triggered. The problem is that CCP only adds nodes to the IGVN worklist if their types have been improved: >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/phaseX.cpp#L1961-L1965 >> >> And in turn, IGVN only adds user nodes to the worklist if the type of the parent node could be further improved. As a result, a `CallNode` is not always added to the worklist if the type of its receiver node has been improved. Often, we are lucky and macro expansion or another optimization phase after CCP adds the `CallNode` to the worklist. However, as `testDynamicCallWithCCP` shows, that's not always the case. >> >> The fix is to explicitly add `CallStaticJavaNode` and `CallDynamicJava` to the worklist after CCP to give call devirtualization a chance to run: >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1054-L1057 >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1137-L1140 >> >> I also considered moving the optimization from `Ideal` to `Value` (which in contrast to `Ideal` is already executed **during** CCP) but we can't trust receiver types during CCP (because CCP goes from optimistic types to pessimistic ones). Also, the current implementation relies on setting the `_generator` field which is not possible from `Value` which is declared `const`: >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1169-L1172 >> >> All the newly added tests fail IR verification if post-parse call devirtualization is disabled (`-XX:-IncrementalInlineVirtual -XX:-IncrementalInlineMH`). Without the fix, `testDynamicCallWithCCP` always fails. >> >> I filed [JDK-8273496](https://bugs.openjdk.java.net/browse/JDK-8273496) to improve how CCP handles this. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Added missing IgnoreUnrecognizedVMOptions Thanks, Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/5418 From thartmann at openjdk.java.net Wed Sep 8 15:36:07 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 8 Sep 2021 15:36:07 GMT Subject: RFR: 8273409: Receiver type narrowed by CCP does not always trigger post-parse call devirtualization [v2] In-Reply-To: <8GJsFmEoa_X9E-PuKFi-ZkypxNJ8qfmfIocUuosnJG0=.31cc7320-c2d1-4501-b407-b71d1128d58b@github.com> References: <01WVxWw0yNGT1ffo-xv4naivyaqIL-oZKSQJB9lwK-E=.2102ab41-53ac-4d6d-9828-7a20f6d09f13@github.com> <8GJsFmEoa_X9E-PuKFi-ZkypxNJ8qfmfIocUuosnJG0=.31cc7320-c2d1-4501-b407-b71d1128d58b@github.com> Message-ID: On Wed, 8 Sep 2021 15:00:48 GMT, Tobias Hartmann wrote: >> While working on [JDK-8273323](https://bugs.openjdk.java.net/browse/JDK-8273323), I noticed that even if CCP is able to narrow the receiver type of a virtual call to an exact type, post-parse call devirtualization is not always triggered. The problem is that CCP only adds nodes to the IGVN worklist if their types have been improved: >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/phaseX.cpp#L1961-L1965 >> >> And in turn, IGVN only adds user nodes to the worklist if the type of the parent node could be further improved. As a result, a `CallNode` is not always added to the worklist if the type of its receiver node has been improved. Often, we are lucky and macro expansion or another optimization phase after CCP adds the `CallNode` to the worklist. However, as `testDynamicCallWithCCP` shows, that's not always the case. >> >> The fix is to explicitly add `CallStaticJavaNode` and `CallDynamicJava` to the worklist after CCP to give call devirtualization a chance to run: >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1054-L1057 >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1137-L1140 >> >> I also considered moving the optimization from `Ideal` to `Value` (which in contrast to `Ideal` is already executed **during** CCP) but we can't trust receiver types during CCP (because CCP goes from optimistic types to pessimistic ones). Also, the current implementation relies on setting the `_generator` field which is not possible from `Value` which is declared `const`: >> https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1169-L1172 >> >> All the newly added tests fail IR verification if post-parse call devirtualization is disabled (`-XX:-IncrementalInlineVirtual -XX:-IncrementalInlineMH`). Without the fix, `testDynamicCallWithCCP` always fails. >> >> I filed [JDK-8273496](https://bugs.openjdk.java.net/browse/JDK-8273496) to improve how CCP handles this. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Added missing IgnoreUnrecognizedVMOptions Thanks, Nils! ------------- PR: https://git.openjdk.java.net/jdk/pull/5418 From vlivanov at openjdk.java.net Wed Sep 8 16:22:18 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 8 Sep 2021 16:22:18 GMT Subject: RFR: 8273359: CI: ciInstanceKlass::get_canonical_holder() doesn't respect instance size In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 17:37:40 GMT, Vladimir Ivanov wrote: > `Compile::flatten_alias_type()` relies on `ciInstanceKlass::get_canonical_holder()` to canonicalise holder class. > When a declared field is not found for a fixed offset (it can happen for unsafe accesses), the next thing `ciInstanceKlass::get_canonical_holder()` does it ascends class hierarchy looking for a most specific class without instance fields declared. But it completely ignores the instance size, so it can report a class as canonical while its size is smaller than the offset. It makes the address looks out-of-bounds which breaks idempotence property of address type flattening, because out-of-bounds field address types are flattened to `TypeOopPtr::BOTTOM`. > > Proposed fix stops the ascend when superclass size shrinks below `offset`. > > Testing: hs-tier1 - hs-tier4 Thanks for the review, Vladimir. ------------- PR: https://git.openjdk.java.net/jdk/pull/5395 From vlivanov at openjdk.java.net Wed Sep 8 16:22:20 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Wed, 8 Sep 2021 16:22:20 GMT Subject: Integrated: 8273359: CI: ciInstanceKlass::get_canonical_holder() doesn't respect instance size In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 17:37:40 GMT, Vladimir Ivanov wrote: > `Compile::flatten_alias_type()` relies on `ciInstanceKlass::get_canonical_holder()` to canonicalise holder class. > When a declared field is not found for a fixed offset (it can happen for unsafe accesses), the next thing `ciInstanceKlass::get_canonical_holder()` does it ascends class hierarchy looking for a most specific class without instance fields declared. But it completely ignores the instance size, so it can report a class as canonical while its size is smaller than the offset. It makes the address looks out-of-bounds which breaks idempotence property of address type flattening, because out-of-bounds field address types are flattened to `TypeOopPtr::BOTTOM`. > > Proposed fix stops the ascend when superclass size shrinks below `offset`. > > Testing: hs-tier1 - hs-tier4 This pull request has now been integrated. Changeset: f7e9f56e Author: Vladimir Ivanov URL: https://git.openjdk.java.net/jdk/commit/f7e9f56e235dc50daae0a85c9790d5b04c9c60f0 Stats: 68 lines in 5 files changed: 61 ins; 0 del; 7 mod 8273359: CI: ciInstanceKlass::get_canonical_holder() doesn't respect instance size Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5395 From sviswanathan at openjdk.java.net Wed Sep 8 20:20:32 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 8 Sep 2021 20:20:32 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files Message-ID: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Fix the copyright header of x86 macroAssembler files to match others. ------------- Commit messages: - 8273512: Fix the copyright header of x86 macroAssembler files Changes: https://git.openjdk.java.net/jdk/pull/5424/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5424&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273512 Stats: 45 lines in 11 files changed: 22 ins; 0 del; 23 mod Patch: https://git.openjdk.java.net/jdk/pull/5424.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5424/head:pull/5424 PR: https://git.openjdk.java.net/jdk/pull/5424 From dholmes at openjdk.java.net Wed Sep 8 23:47:04 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 8 Sep 2021 23:47:04 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files In-Reply-To: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: On Wed, 8 Sep 2021 20:09:10 GMT, Sandhya Viswanathan wrote: > Fix the copyright header of x86 macroAssembler files to match others. Hi Sandhya, Hotspot files do not have the "Classpath exception". Thanks, David ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5424 From sviswanathan at openjdk.java.net Thu Sep 9 00:10:20 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 9 Sep 2021 00:10:20 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files [v2] In-Reply-To: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: > Fix the copyright header of x86 macroAssembler files to match others. Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: implement review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5424/files - new: https://git.openjdk.java.net/jdk/pull/5424/files/236b4db0..4e7f94d7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5424&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5424&range=00-01 Stats: 33 lines in 11 files changed: 0 ins; 22 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/5424.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5424/head:pull/5424 PR: https://git.openjdk.java.net/jdk/pull/5424 From dholmes at openjdk.java.net Thu Sep 9 00:38:14 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 9 Sep 2021 00:38:14 GMT Subject: RFR: 8273516: ProblemList compiler/c2/Test7179138_1.java in -Xcomp mode on win-X64 In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 00:26:15 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList compiler/c2/Test7179138_1.java in -Xcomp mode on win-X64. Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5429 From dcubed at openjdk.java.net Thu Sep 9 00:38:14 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 9 Sep 2021 00:38:14 GMT Subject: RFR: 8273516: ProblemList compiler/c2/Test7179138_1.java in -Xcomp mode on win-X64 Message-ID: A trivial fix to ProblemList compiler/c2/Test7179138_1.java in -Xcomp mode on win-X64. ------------- Commit messages: - 8273516: ProblemList compiler/c2/Test7179138_1.java in -Xcomp mode on win-X64 Changes: https://git.openjdk.java.net/jdk/pull/5429/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5429&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273516 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5429.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5429/head:pull/5429 PR: https://git.openjdk.java.net/jdk/pull/5429 From sviswanathan at openjdk.java.net Thu Sep 9 00:40:01 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 9 Sep 2021 00:40:01 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files [v2] In-Reply-To: References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: On Wed, 8 Sep 2021 23:44:27 GMT, David Holmes wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> implement review comments > > Hi Sandhya, > > Hotspot files do not have the "Classpath exception". > > Thanks, > David Thanks @dholmes-ora. I have removed the classpath exception. Let me know if the patch looks good to you now. ------------- PR: https://git.openjdk.java.net/jdk/pull/5424 From dcubed at openjdk.java.net Thu Sep 9 00:42:06 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 9 Sep 2021 00:42:06 GMT Subject: Integrated: 8273516: ProblemList compiler/c2/Test7179138_1.java in -Xcomp mode on win-X64 In-Reply-To: References: Message-ID: <2JTmwcx91SrWjJc-S1NazwivueHTWLEv7pO6sNjg88o=.ac449202-9826-4c4a-b013-4f4222711072@github.com> On Thu, 9 Sep 2021 00:26:15 GMT, Daniel D. Daugherty wrote: > A trivial fix to ProblemList compiler/c2/Test7179138_1.java in -Xcomp mode on win-X64. This pull request has now been integrated. Changeset: 12f0b771 Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/12f0b771791614b8a41fc2c62d34481f911109b0 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8273516: ProblemList compiler/c2/Test7179138_1.java in -Xcomp mode on win-X64 Reviewed-by: dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/5429 From dcubed at openjdk.java.net Thu Sep 9 00:42:06 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 9 Sep 2021 00:42:06 GMT Subject: RFR: 8273516: ProblemList compiler/c2/Test7179138_1.java in -Xcomp mode on win-X64 In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 00:33:31 GMT, David Holmes wrote: >> A trivial fix to ProblemList compiler/c2/Test7179138_1.java in -Xcomp mode on win-X64. > > Marked as reviewed by dholmes (Reviewer). @dholmes-ora - Thanks for the fast review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5429 From dholmes at openjdk.java.net Thu Sep 9 01:26:01 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 9 Sep 2021 01:26:01 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files [v2] In-Reply-To: References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: On Thu, 9 Sep 2021 00:10:20 GMT, Sandhya Viswanathan wrote: >> Fix the copyright header of x86 macroAssembler files to match others. > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > implement review comments Hi Sandhya, Under the assumption that this is indeed the right way to apply Intel copyrights I can approve this PR. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5424 From wuyan at openjdk.java.net Thu Sep 9 02:12:03 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Thu, 9 Sep 2021 02:12:03 GMT Subject: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v7] In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 01:38:02 GMT, Nick Gasson wrote: > Please check the Windows tier1 failure: https://github.com/Wanghuang-Huawei/jdk/runs/3459332995 > > Seems unlikely that it's anything to do with this patch so you may just want to re-run it or merge from master. OK, The rerun of presubmit test show that it passed all tests. The result is here: https://github.com/Wanghuang-Huawei/jdk/actions/runs/1181122290 ------------- PR: https://git.openjdk.java.net/jdk/pull/4722 From eliu at openjdk.java.net Thu Sep 9 02:47:56 2021 From: eliu at openjdk.java.net (Eric Liu) Date: Thu, 9 Sep 2021 02:47:56 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v2] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Wed, 8 Sep 2021 12:33:09 GMT, Zhengyu Gu wrote: >> src/hotspot/share/opto/mulnode.cpp line 211: >> >>> 209: Node* n21 = in2->in(1); >>> 210: if (phase->type(n11)->higher_equal(TypeInt::ZERO) && >>> 211: phase->type(n21)->higher_equal(TypeInt::ZERO)) { >> >> I was thinking if it's a good idea to move these code into MulNode, as they were actually much the same with MulLNode. > > I wonder that too, so is the rest of MulINode/MulLNode::Ideal() code (and many other places). I am not sure how to workaround the different types, any suggestions? Just a dogfood, but it works. https://gist.github.com/theRealELiu/328d62157975b1f20e3626b3ef747eb4 Too much abstraction makes the code hard to read. One needs to check the concrete class to identify what the code exactly is, E.g. In my patch, add_id() may be TypeInt::ZERO or TypeLong::Zero, even TypeD::ZERO. So I'm not sure if it's a good idea. Is there any guidelines to this issue, try to abstract them or make the readability in the first place? @TobiHartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From eliu at openjdk.java.net Thu Sep 9 02:54:01 2021 From: eliu at openjdk.java.net (Eric Liu) Date: Thu, 9 Sep 2021 02:54:01 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v3] In-Reply-To: References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: <_n8PMbtb0l9S_86v0G9KbJ5d3mEw_D5EHg0YQFixDl0=.0d82e8f5-6388-408a-a180-38f27d1ce774@github.com> On Wed, 8 Sep 2021 05:56:42 GMT, Yi Yang wrote: >> Greetings. This patch adds the following identical equations for Add and Xor node, respectively, which probably drives further optimizations. >> >> >> ~(x-1) => -x >> ~x + 1 => -x >> >> >> >> Verified by generated opto assembly, maybe an IR verification test can be added later. >> >> Compiled method (c2) 71 1 compiler.c2.TestAddXorIdeal::test1 (6 bytes) >> 0x00007f9e11514800: sub $0x18,%rsp >> 0x00007f9e11514807: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test1 at -1 (line 39) >> 0x00007f9e1151480c: mov %esi,%eax >> 0x00007f9e1151480e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test1 at 4 (line 39) >> 0x00007f9e11514810: add $0x10,%rsp >> 0x00007f9e11514814: pop %rbp >> 0x00007f9e11514815: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151481c: ja 0x00007f9e11514823 >> 0x00007f9e11514822: retq >> >> Compiled method (c2) 73 2 compiler.c2.TestAddXorIdeal::test2 (6 bytes) >> 0x00007f9e11512480: sub $0x18,%rsp >> 0x00007f9e11512487: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test2 at -1 (line 43) >> 0x00007f9e1151248c: mov %esi,%eax >> 0x00007f9e1151248e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test2 at 4 (line 43) >> 0x00007f9e11512490: add $0x10,%rsp >> 0x00007f9e11512494: pop %rbp >> 0x00007f9e11512495: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151249c: ja 0x00007f9e115124a3 >> 0x00007f9e115124a2: retq >> >> Compiled method (c2) 72 3 compiler.c2.TestAddXorIdeal::test3 (8 bytes) >> 0x00007f9e11514b00: sub $0x18,%rsp >> 0x00007f9e11514b07: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test3 at -1 (line 47) >> 0x00007f9e11514b0c: mov %rsi,%rax >> 0x00007f9e11514b0f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test3 at 6 (line 47) >> 0x00007f9e11514b12: add $0x10,%rsp >> 0x00007f9e11514b16: pop %rbp >> 0x00007f9e11514b17: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e11514b1e: ja 0x00007f9e11514b25 >> 0x00007f9e11514b24: retq >> >> Compiled method (c2) 72 4 compiler.c2.TestAddXorIdeal::test4 (8 bytes) >> 0x00007f9e11514500: sub $0x18,%rsp >> 0x00007f9e11514507: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test4 at -1 (line 51) >> 0x00007f9e1151450c: mov %rsi,%rax >> 0x00007f9e1151450f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test4 at 6 (line 51) >> 0x00007f9e11514512: add $0x10,%rsp >> 0x00007f9e11514516: pop %rbp >> 0x00007f9e11514517: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151451e: ja 0x00007f9e11514525 >> 0x00007f9e11514524: retq >> >> Compiled method (c2) 72 5 compiler.c2.TestAddXorIdeal::test5 (6 bytes) >> 0x00007f9e11518500: sub $0x18,%rsp >> 0x00007f9e11518507: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test5 at -1 (line 55) >> 0x00007f9e1151850c: mov %esi,%eax >> 0x00007f9e1151850e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test5 at 4 (line 55) >> 0x00007f9e11518510: add $0x10,%rsp >> 0x00007f9e11518514: pop %rbp >> 0x00007f9e11518515: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151851c: ja 0x00007f9e11518523 >> 0x00007f9e11518522: retq >> >> Compiled method (c2) 74 6 compiler.c2.TestAddXorIdeal::test6 (6 bytes) >> 0x00007f9e11512180: sub $0x18,%rsp >> 0x00007f9e11512187: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test6 at -1 (line 59) >> 0x00007f9e1151218c: mov %esi,%eax >> 0x00007f9e1151218e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test6 at 4 (line 59) >> 0x00007f9e11512190: add $0x10,%rsp >> 0x00007f9e11512194: pop %rbp >> 0x00007f9e11512195: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151219c: ja 0x00007f9e115121a3 >> 0x00007f9e115121a2: retq >> >> Compiled method (c2) 74 7 compiler.c2.TestAddXorIdeal::test7 (8 bytes) >> 0x00007f9e11511e80: sub $0x18,%rsp >> 0x00007f9e11511e87: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test7 at -1 (line 63) >> 0x00007f9e11511e8c: mov %rsi,%rax >> 0x00007f9e11511e8f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test7 at 6 (line 63) >> 0x00007f9e11511e92: add $0x10,%rsp >> 0x00007f9e11511e96: pop %rbp >> 0x00007f9e11511e97: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e11511e9e: ja 0x00007f9e11511ea5 >> 0x00007f9e11511ea4: retq >> >> >> Compiled method (c2) 73 8 compiler.c2.TestAddXorIdeal::test8 (10 bytes) >> 0x00007f9e11512780: sub $0x18,%rsp >> 0x00007f9e11512787: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test8 at -1 (line 67) >> 0x00007f9e1151278c: mov %rsi,%rax >> 0x00007f9e1151278f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test8 at 8 (line 67) >> 0x00007f9e11512792: add $0x10,%rsp >> 0x00007f9e11512796: pop %rbp >> 0x00007f9e11512797: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151279e: ja 0x00007f9e115127a5 >> 0x00007f9e115127a4: retq > > Yi Yang has updated the pull request incrementally with two additional commits since the last revision: > > - dontinline > - more random LGTM ------------- PR: https://git.openjdk.java.net/jdk/pull/5266 From ngasson at openjdk.java.net Thu Sep 9 03:10:00 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 9 Sep 2021 03:10:00 GMT Subject: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v7] In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 06:26:01 GMT, Wang Huang wrote: >> Dear all, >> Can you do me a favor to review this patch. This patch use `ldp` to implement String.compareTo. >> >> * We add a JMH test case >> * Here is the result of this test case >> >> Benchmark |(size)| Mode| Cnt|Score | Error |Units >> ---------------------------------|------|-----|----|------|--------|----- >> StringCompare.compareLL | 64 | avgt| 5 |7.992 | ? 0.005|us/op >> StringCompare.compareLL | 72 | avgt| 5 |15.029| ? 0.006|us/op >> StringCompare.compareLL | 80 | avgt| 5 |14.655| ? 0.011|us/op >> StringCompare.compareLL | 91 | avgt| 5 |16.363| ? 0.12 |us/op >> StringCompare.compareLL | 101 | avgt| 5 |16.966| ? 0.007|us/op >> StringCompare.compareLL | 121 | avgt| 5 |19.276| ? 0.006|us/op >> StringCompare.compareLL | 181 | avgt| 5 |19.002| ? 0.417|us/op >> StringCompare.compareLL | 256 | avgt| 5 |24.707| ? 0.041|us/op >> StringCompare.compareLLWithLdp| 64 | avgt| 5 |8.001 | ? 0.121|us/op >> StringCompare.compareLLWithLdp| 72 | avgt| 5 |11.573| ? 0.003|us/op >> StringCompare.compareLLWithLdp| 80 | avgt| 5 |6.861 | ? 0.004|us/op >> StringCompare.compareLLWithLdp| 91 | avgt| 5 |12.774| ? 0.201|us/op >> StringCompare.compareLLWithLdp| 101 | avgt| 5 |8.691 | ? 0.004|us/op >> StringCompare.compareLLWithLdp| 121 | avgt| 5 |11.091| ? 1.342|us/op >> StringCompare.compareLLWithLdp| 181 | avgt| 5 |14.64 | ? 0.581|us/op >> StringCompare.compareLLWithLdp| 256 | avgt| 5 |25.879| ? 1.775|us/op >> StringCompare.compareUU | 64 | avgt| 5 |13.476| ? 0.01 |us/op >> StringCompare.compareUU | 72 | avgt| 5 |15.078| ? 0.006|us/op >> StringCompare.compareUU | 80 | avgt| 5 |23.512| ? 0.011|us/op >> StringCompare.compareUU | 91 | avgt| 5 |24.284| ? 0.008|us/op >> StringCompare.compareUU | 101 | avgt| 5 |20.707| ? 0.017|us/op >> StringCompare.compareUU | 121 | avgt| 5 |29.302| ? 0.011|us/op >> StringCompare.compareUU | 181 | avgt| 5 |39.31 | ? 0.016|us/op >> StringCompare.compareUU | 256 | avgt| 5 |54.592| ? 0.392|us/op >> StringCompare.compareUUWithLdp| 64 | avgt| 5 |16.389| ? 0.008|us/op >> StringCompare.compareUUWithLdp| 72 | avgt| 5 |10.71 | ? 0.158|us/op >> StringCompare.compareUUWithLdp| 80 | avgt| 5 |11.488| ? 0.024|us/op >> StringCompare.compareUUWithLdp| 91 | avgt| 5 |13.412| ? 0.006|us/op >> StringCompare.compareUUWithLdp| 101 | avgt| 5 |16.245| ? 0.434|us/op >> StringCompare.compareUUWithLdp| 121 | avgt| 5 |16.597| ? 0.016|us/op >> StringCompare.compareUUWithLdp| 181 | avgt| 5 |27.373| ? 0.017|us/op >> StringCompare.compareUUWithLdp| 256 | avgt| 5 |41.74 | ? 3.5 |us/op >> >> From this table, we can see that in most cases, our patch is better than old one. >> >> Thank you for your review. Any suggestions are welcome. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix windows build failed Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4722 From thartmann at openjdk.java.net Thu Sep 9 06:20:06 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 9 Sep 2021 06:20:06 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files [v2] In-Reply-To: References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: On Thu, 9 Sep 2021 00:10:20 GMT, Sandhya Viswanathan wrote: >> Fix the copyright header of x86 macroAssembler files to match others. > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > implement review comments What about this one? https://github.com/openjdk/jdk/blob/0417fcf13f7f2159499d325f2b3ace49d2643557/src/hotspot/cpu/aarch64/macroAssembler_aarch64_log.cpp#L2 Other files look good and consistent with the Intel copyright in `src/jdk.incubator.vector/linux/native/libsvml/*`. ------------- Changes requested by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5424 From thartmann at openjdk.java.net Thu Sep 9 06:20:17 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 9 Sep 2021 06:20:17 GMT Subject: RFR: 8273498: compiler/c2/Test7179138_1.java timed out Message-ID: <7aVtxqcslVMM50Jg-7jfYJD9sJIamcjY_orMrpTnZh0=.51629a47-dbc6-4d61-a700-17b69feda425@github.com> The test times out with `-Xcomp` because it sets `-XX:RepeatCompilation=100` to give `-XX:+StressIGVN` a chance to reproduce the issue. I propose to simply remove the `RepeatCompilation`flag, given that tests are usually executed many times. That's also consistent with how we do it with other tests that use stress flags (for example, `TestDeadLoopSplitIfLoop.java`). Thanks, Tobias ------------- Commit messages: - 8273498: compiler/c2/Test7179138_1.java timed out Changes: https://git.openjdk.java.net/jdk/pull/5431/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5431&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273498 Stats: 3 lines in 2 files changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5431.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5431/head:pull/5431 PR: https://git.openjdk.java.net/jdk/pull/5431 From thartmann at openjdk.java.net Thu Sep 9 06:46:10 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 9 Sep 2021 06:46:10 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v3] In-Reply-To: References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: On Wed, 8 Sep 2021 05:56:42 GMT, Yi Yang wrote: >> Greetings. This patch adds the following identical equations for Add and Xor node, respectively, which probably drives further optimizations. >> >> >> ~(x-1) => -x >> ~x + 1 => -x >> >> >> >> Verified by generated opto assembly, maybe an IR verification test can be added later. >> >> Compiled method (c2) 71 1 compiler.c2.TestAddXorIdeal::test1 (6 bytes) >> 0x00007f9e11514800: sub $0x18,%rsp >> 0x00007f9e11514807: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test1 at -1 (line 39) >> 0x00007f9e1151480c: mov %esi,%eax >> 0x00007f9e1151480e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test1 at 4 (line 39) >> 0x00007f9e11514810: add $0x10,%rsp >> 0x00007f9e11514814: pop %rbp >> 0x00007f9e11514815: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151481c: ja 0x00007f9e11514823 >> 0x00007f9e11514822: retq >> >> Compiled method (c2) 73 2 compiler.c2.TestAddXorIdeal::test2 (6 bytes) >> 0x00007f9e11512480: sub $0x18,%rsp >> 0x00007f9e11512487: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test2 at -1 (line 43) >> 0x00007f9e1151248c: mov %esi,%eax >> 0x00007f9e1151248e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test2 at 4 (line 43) >> 0x00007f9e11512490: add $0x10,%rsp >> 0x00007f9e11512494: pop %rbp >> 0x00007f9e11512495: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151249c: ja 0x00007f9e115124a3 >> 0x00007f9e115124a2: retq >> >> Compiled method (c2) 72 3 compiler.c2.TestAddXorIdeal::test3 (8 bytes) >> 0x00007f9e11514b00: sub $0x18,%rsp >> 0x00007f9e11514b07: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test3 at -1 (line 47) >> 0x00007f9e11514b0c: mov %rsi,%rax >> 0x00007f9e11514b0f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test3 at 6 (line 47) >> 0x00007f9e11514b12: add $0x10,%rsp >> 0x00007f9e11514b16: pop %rbp >> 0x00007f9e11514b17: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e11514b1e: ja 0x00007f9e11514b25 >> 0x00007f9e11514b24: retq >> >> Compiled method (c2) 72 4 compiler.c2.TestAddXorIdeal::test4 (8 bytes) >> 0x00007f9e11514500: sub $0x18,%rsp >> 0x00007f9e11514507: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test4 at -1 (line 51) >> 0x00007f9e1151450c: mov %rsi,%rax >> 0x00007f9e1151450f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test4 at 6 (line 51) >> 0x00007f9e11514512: add $0x10,%rsp >> 0x00007f9e11514516: pop %rbp >> 0x00007f9e11514517: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151451e: ja 0x00007f9e11514525 >> 0x00007f9e11514524: retq >> >> Compiled method (c2) 72 5 compiler.c2.TestAddXorIdeal::test5 (6 bytes) >> 0x00007f9e11518500: sub $0x18,%rsp >> 0x00007f9e11518507: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test5 at -1 (line 55) >> 0x00007f9e1151850c: mov %esi,%eax >> 0x00007f9e1151850e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test5 at 4 (line 55) >> 0x00007f9e11518510: add $0x10,%rsp >> 0x00007f9e11518514: pop %rbp >> 0x00007f9e11518515: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151851c: ja 0x00007f9e11518523 >> 0x00007f9e11518522: retq >> >> Compiled method (c2) 74 6 compiler.c2.TestAddXorIdeal::test6 (6 bytes) >> 0x00007f9e11512180: sub $0x18,%rsp >> 0x00007f9e11512187: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test6 at -1 (line 59) >> 0x00007f9e1151218c: mov %esi,%eax >> 0x00007f9e1151218e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test6 at 4 (line 59) >> 0x00007f9e11512190: add $0x10,%rsp >> 0x00007f9e11512194: pop %rbp >> 0x00007f9e11512195: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151219c: ja 0x00007f9e115121a3 >> 0x00007f9e115121a2: retq >> >> Compiled method (c2) 74 7 compiler.c2.TestAddXorIdeal::test7 (8 bytes) >> 0x00007f9e11511e80: sub $0x18,%rsp >> 0x00007f9e11511e87: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test7 at -1 (line 63) >> 0x00007f9e11511e8c: mov %rsi,%rax >> 0x00007f9e11511e8f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test7 at 6 (line 63) >> 0x00007f9e11511e92: add $0x10,%rsp >> 0x00007f9e11511e96: pop %rbp >> 0x00007f9e11511e97: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e11511e9e: ja 0x00007f9e11511ea5 >> 0x00007f9e11511ea4: retq >> >> >> Compiled method (c2) 73 8 compiler.c2.TestAddXorIdeal::test8 (10 bytes) >> 0x00007f9e11512780: sub $0x18,%rsp >> 0x00007f9e11512787: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test8 at -1 (line 67) >> 0x00007f9e1151278c: mov %rsi,%rax >> 0x00007f9e1151278f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test8 at 8 (line 67) >> 0x00007f9e11512792: add $0x10,%rsp >> 0x00007f9e11512796: pop %rbp >> 0x00007f9e11512797: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151279e: ja 0x00007f9e115127a5 >> 0x00007f9e115127a4: retq > > Yi Yang has updated the pull request incrementally with two additional commits since the last revision: > > - dontinline > - more random Changes requested by thartmann (Reviewer). test/hotspot/jtreg/compiler/c2/TestAddXorIdeal.java line 28: > 26: * @test > 27: * @bug 8273021 > 28: * @summary C2: Improve Add and Xor ideal optimizations The test needs ` * @key randomness` test/hotspot/jtreg/compiler/c2/TestAddXorIdeal.java line 30: > 28: * @summary C2: Improve Add and Xor ideal optimizations > 29: * @library /test/lib > 30: * @run main/othervm -XX:-TieredCompilation -XX:TieredStopAtLevel=4 `TieredStopAtLevel` has no effect if Tiered Compilation is turned off. You can remove it. test/hotspot/jtreg/compiler/c2/TestAddXorIdeal.java line 75: > 73: > 74: public static void main(String... args) { > 75: Random random = new Random(); You should use `Utils.getRandomInstance()` from `import jdk.test.lib.Utils` to ensure that the seed is printed for reproducibility. You can check other tests for an example. test/hotspot/jtreg/compiler/c2/TestAddXorIdeal.java line 77: > 75: Random random = new Random(); > 76: int n = 0; > 77: long n1 = 0; Should be declared in the loop. test/hotspot/jtreg/compiler/c2/TestAddXorIdeal.java line 80: > 78: for (int i = -5_000; i < 5_000; i++) { > 79: n = random.nextInt(); > 80: Asserts.assertTrue(test1(i + n) == -(i + n)); Now that you are using random numbers, can't you simply check `Asserts.assertTrue(test1(n) == -n)`? And just loop for a fixed number of iterations. ------------- PR: https://git.openjdk.java.net/jdk/pull/5266 From thartmann at openjdk.java.net Thu Sep 9 07:34:04 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 9 Sep 2021 07:34:04 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v2] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Thu, 9 Sep 2021 02:45:09 GMT, Eric Liu wrote: >> I wonder that too, so is the rest of MulINode/MulLNode::Ideal() code (and many other places). I am not sure how to workaround the different types, any suggestions? > > Just a dogfood, but it works. https://gist.github.com/theRealELiu/328d62157975b1f20e3626b3ef747eb4 > > Too much abstraction makes the code hard to read. One needs to check the concrete class to identify what the code exactly is, E.g. In my patch, add_id() may be TypeInt::ZERO or TypeLong::Zero, even TypeD::ZERO. So I'm not sure if it's a good idea. Is there any guidelines to this issue, try to abstract them or make the readability in the first place? @TobiHartmann Yes, I would also prefer to move the optimization into `MulNode::Ideal`. @theRealELiu's patch is good but can be further improved by modifying the node inputs instead of returning a new node (similar to the other optimizations in `MulNode::Ideal`). ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From thartmann at openjdk.java.net Thu Sep 9 07:34:04 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 9 Sep 2021 07:34:04 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v2] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Thu, 9 Sep 2021 07:17:28 GMT, Tobias Hartmann wrote: >> Just a dogfood, but it works. https://gist.github.com/theRealELiu/328d62157975b1f20e3626b3ef747eb4 >> >> Too much abstraction makes the code hard to read. One needs to check the concrete class to identify what the code exactly is, E.g. In my patch, add_id() may be TypeInt::ZERO or TypeLong::Zero, even TypeD::ZERO. So I'm not sure if it's a good idea. Is there any guidelines to this issue, try to abstract them or make the readability in the first place? @TobiHartmann > > Yes, I would also prefer to move the optimization into `MulNode::Ideal`. @theRealELiu's patch is good but can be further improved by modifying the node inputs instead of returning a new node (similar to the other optimizations in `MulNode::Ideal`). Also, `Type::is_zero_type` can be used to detect 0 and instead of checking the opcodes, `Node::is_Sub` should be used. ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From thartmann at openjdk.java.net Thu Sep 9 07:34:04 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 9 Sep 2021 07:34:04 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v2] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Wed, 8 Sep 2021 02:09:38 GMT, Zhengyu Gu wrote: >> The transformation reduce instructions in generated code. >> >> ### x86_64: >> >> Before: >> ``` >> 0x00007fb92c78b3ac: neg %esi >> 0x00007fb92c78b3ae: neg %edx >> 0x00007fb92c78b3b0: mov %esi,%eax >> 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> After: >> >> ; - TestSub::runSub at -1 (line 9) >> 0x00007fc8c05b74ac: mov %esi,%eax >> 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> >> ### AArch64: >> Before: >> >> 0x0000ffff814b4a70: neg w11, w1 >> 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> After: >> >> 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Fix test Changes requested by thartmann (Reviewer). test/hotspot/jtreg/compiler/integerArithmetic/TestNegMultiply.java line 26: > 24: /** > 25: * @test > 26: * @bug 8270366 The bug number is incorrect. test/hotspot/jtreg/compiler/integerArithmetic/TestNegMultiply.java line 52: > 50: int result = intTest(intParams[index][0], intParams[index][1]); > 51: for (int i = 0; i < 20_000; i++) { > 52: if (result != intTest(intParams[index][0], intParams[index][1])) { After some warmup iterations, `intTest` will be C2 compiled and you are then comparing outputs of the same compiled method. I.e., if there's a bug in the C2 optimization, the test might not catch it. What you should do instead, is to compare the output of the C2 compiled method to the expected value (which is `a * b` in this case). You should also prevent inlining of `intTest`. The test you added with JDK-8270366 has the same problem. test/hotspot/jtreg/compiler/integerArithmetic/TestNegMultiply.java line 59: > 57: } > 58: > 59: private static final long[][] longParams = { Similar to https://git.openjdk.java.net/jdk/pull/5266, I would prefer random values for better coverage. ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From thartmann at openjdk.java.net Thu Sep 9 07:36:59 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 9 Sep 2021 07:36:59 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 00:19:18 GMT, John Tortugo wrote: >> Great, thanks! Btw, you can merge and now use `RunInfo.getRandom().XX()` for a handy access to random values (if needed) as the PR for JDK-8272567 was integrated in the meantime. > > Hi, again @chhagedorn. I added some `custom run tests` to tests that seemed more "complex". Please let me know if there are others that you think I should add. @JohnTortugo just FYI, @chhagedorn is currently on vacation but will be back mid next week. ------------- PR: https://git.openjdk.java.net/jdk/pull/5135 From roland at openjdk.java.net Thu Sep 9 08:05:07 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 9 Sep 2021 08:05:07 GMT Subject: RFR: 8273498: compiler/c2/Test7179138_1.java timed out In-Reply-To: <7aVtxqcslVMM50Jg-7jfYJD9sJIamcjY_orMrpTnZh0=.51629a47-dbc6-4d61-a700-17b69feda425@github.com> References: <7aVtxqcslVMM50Jg-7jfYJD9sJIamcjY_orMrpTnZh0=.51629a47-dbc6-4d61-a700-17b69feda425@github.com> Message-ID: On Thu, 9 Sep 2021 06:11:34 GMT, Tobias Hartmann wrote: > The test times out with `-Xcomp` because it sets `-XX:RepeatCompilation=100` to give `-XX:+StressIGVN` a chance to reproduce the issue. I propose to simply remove the `RepeatCompilation`flag, given that tests are usually executed many times. That's also consistent with how we do it with other tests that use stress flags (for example, `TestDeadLoopSplitIfLoop.java`). > > Thanks, > Tobias Looks good to me. Thanks for taking care of that failure. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5431 From thartmann at openjdk.java.net Thu Sep 9 08:13:05 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 9 Sep 2021 08:13:05 GMT Subject: RFR: 8273498: compiler/c2/Test7179138_1.java timed out In-Reply-To: <7aVtxqcslVMM50Jg-7jfYJD9sJIamcjY_orMrpTnZh0=.51629a47-dbc6-4d61-a700-17b69feda425@github.com> References: <7aVtxqcslVMM50Jg-7jfYJD9sJIamcjY_orMrpTnZh0=.51629a47-dbc6-4d61-a700-17b69feda425@github.com> Message-ID: On Thu, 9 Sep 2021 06:11:34 GMT, Tobias Hartmann wrote: > The test times out with `-Xcomp` because it sets `-XX:RepeatCompilation=100` to give `-XX:+StressIGVN` a chance to reproduce the issue. I propose to simply remove the `RepeatCompilation`flag, given that tests are usually executed many times. That's also consistent with how we do it with other tests that use stress flags (for example, `TestDeadLoopSplitIfLoop.java`). > > Thanks, > Tobias Thanks, Roland. ------------- PR: https://git.openjdk.java.net/jdk/pull/5431 From mdoerr at openjdk.java.net Thu Sep 9 09:37:23 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 9 Sep 2021 09:37:23 GMT Subject: RFR: 8273539: [PPC64] gtest build error after JDK-8264207 Message-ID: `macroAssembler.inline.hpp` needs to be included to fix the build. Definition of e.g. `nop()` is not visible otherwise. See JBS bug for build error message. ------------- Commit messages: - 8273539: [PPC64] gtest build error after JDK-8264207 Changes: https://git.openjdk.java.net/jdk/pull/5438/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5438&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273539 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5438.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5438/head:pull/5438 PR: https://git.openjdk.java.net/jdk/pull/5438 From shade at openjdk.java.net Thu Sep 9 09:49:02 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 9 Sep 2021 09:49:02 GMT Subject: RFR: 8273539: [PPC64] gtest build error after JDK-8264207 In-Reply-To: References: Message-ID: <6jnrTY9NNZ6a0V1qoEdl0yvecaaP4vrTnHXuaoIdaoI=.eeea3eca-2a96-46d9-a1f6-aa563119bd98@github.com> On Thu, 9 Sep 2021 09:28:01 GMT, Martin Doerr wrote: > `macroAssembler.inline.hpp` needs to be included to fix the build. Definition of e.g. `nop()` is not visible otherwise. See JBS bug for build error message. Looks good and trivial. (My CIs are also failing for two days now, but I only noticed now...) ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5438 From stuefe at openjdk.java.net Thu Sep 9 09:49:02 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 9 Sep 2021 09:49:02 GMT Subject: RFR: 8273539: [PPC64] gtest build error after JDK-8264207 In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 09:28:01 GMT, Martin Doerr wrote: > `macroAssembler.inline.hpp` needs to be included to fix the build. Definition of e.g. `nop()` is not visible otherwise. See JBS bug for build error message. Good and trivial. ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5438 From mdoerr at openjdk.java.net Thu Sep 9 10:51:07 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 9 Sep 2021 10:51:07 GMT Subject: RFR: 8273539: [PPC64] gtest build error after JDK-8264207 In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 09:28:01 GMT, Martin Doerr wrote: > `macroAssembler.inline.hpp` needs to be included to fix the build. Definition of e.g. `nop()` is not visible otherwise. See JBS bug for build error message. Thanks for the reviews! I'm just integrating it to get the build working again. ------------- PR: https://git.openjdk.java.net/jdk/pull/5438 From mdoerr at openjdk.java.net Thu Sep 9 10:51:07 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 9 Sep 2021 10:51:07 GMT Subject: Integrated: 8273539: [PPC64] gtest build error after JDK-8264207 In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 09:28:01 GMT, Martin Doerr wrote: > `macroAssembler.inline.hpp` needs to be included to fix the build. Definition of e.g. `nop()` is not visible otherwise. See JBS bug for build error message. This pull request has now been integrated. Changeset: f6cc1732 Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/f6cc1732f47672cea413fa842c4f106c1314c626 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8273539: [PPC64] gtest build error after JDK-8264207 Reviewed-by: shade, stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/5438 From zgu at openjdk.java.net Thu Sep 9 12:19:09 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 9 Sep 2021 12:19:09 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v2] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Thu, 9 Sep 2021 07:19:13 GMT, Tobias Hartmann wrote: >> Yes, I would also prefer to move the optimization into `MulNode::Ideal`. @theRealELiu's patch is good but can be further improved by modifying the node inputs instead of returning a new node (similar to the other optimizations in `MulNode::Ideal`). > > Also, `Type::is_zero_type` can be used to detect 0 and instead of checking the opcodes, `Node::is_Sub` should be used. Nice! Thanks, I will make changes accordingly. ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From zgu at openjdk.java.net Thu Sep 9 13:25:38 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 9 Sep 2021 13:25:38 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v3] In-Reply-To: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: > The transformation reduce instructions in generated code. > > ### x86_64: > > Before: > ``` > 0x00007fb92c78b3ac: neg %esi > 0x00007fb92c78b3ae: neg %edx > 0x00007fb92c78b3b0: mov %esi,%eax > 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > After: > > ; - TestSub::runSub at -1 (line 9) > 0x00007fc8c05b74ac: mov %esi,%eax > 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > > > ### AArch64: > Before: > > 0x0000ffff814b4a70: neg w11, w1 > 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > > After: > > 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Spacing - Merge branch 'master' into JDK-8273454-neg-mul - @theRealELiu and @TobiHartmann's comments - Fix test - Merge branch 'master' into JDK-8273454-neg-mul - v1 - v0 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5403/files - new: https://git.openjdk.java.net/jdk/pull/5403/files/c55a3273..8f7f241f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5403&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5403&range=01-02 Stats: 2110 lines in 180 files changed: 1171 ins; 296 del; 643 mod Patch: https://git.openjdk.java.net/jdk/pull/5403.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5403/head:pull/5403 PR: https://git.openjdk.java.net/jdk/pull/5403 From thartmann at openjdk.java.net Thu Sep 9 13:47:12 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 9 Sep 2021 13:47:12 GMT Subject: Integrated: 8273409: Receiver type narrowed by CCP does not always trigger post-parse call devirtualization In-Reply-To: <01WVxWw0yNGT1ffo-xv4naivyaqIL-oZKSQJB9lwK-E=.2102ab41-53ac-4d6d-9828-7a20f6d09f13@github.com> References: <01WVxWw0yNGT1ffo-xv4naivyaqIL-oZKSQJB9lwK-E=.2102ab41-53ac-4d6d-9828-7a20f6d09f13@github.com> Message-ID: On Wed, 8 Sep 2021 14:41:54 GMT, Tobias Hartmann wrote: > While working on [JDK-8273323](https://bugs.openjdk.java.net/browse/JDK-8273323), I noticed that even if CCP is able to narrow the receiver type of a virtual call to an exact type, post-parse call devirtualization is not always triggered. The problem is that CCP only adds nodes to the IGVN worklist if their types have been improved: > https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/phaseX.cpp#L1961-L1965 > > And in turn, IGVN only adds user nodes to the worklist if the type of the parent node could be further improved. As a result, a `CallNode` is not always added to the worklist if the type of its receiver node has been improved. Often, we are lucky and macro expansion or another optimization phase after CCP adds the `CallNode` to the worklist. However, as `testDynamicCallWithCCP` shows, that's not always the case. > > The fix is to explicitly add `CallStaticJavaNode` and `CallDynamicJava` to the worklist after CCP to give call devirtualization a chance to run: > https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1054-L1057 > https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1137-L1140 > > I also considered moving the optimization from `Ideal` to `Value` (which in contrast to `Ideal` is already executed **during** CCP) but we can't trust receiver types during CCP (because CCP goes from optimistic types to pessimistic ones). Also, the current implementation relies on setting the `_generator` field which is not possible from `Value` which is declared `const`: > https://github.com/openjdk/jdk/blob/4023646ed1bcb821b1d18f7e5104f04995e8171d/src/hotspot/share/opto/callnode.cpp#L1169-L1172 > > All the newly added tests fail IR verification if post-parse call devirtualization is disabled (`-XX:-IncrementalInlineVirtual -XX:-IncrementalInlineMH`). Without the fix, `testDynamicCallWithCCP` always fails. > > I filed [JDK-8273496](https://bugs.openjdk.java.net/browse/JDK-8273496) to improve how CCP handles this. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 4866eaa9 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/4866eaa997b2ee2a47bdcd0d96202f220fb2774d Stats: 205 lines in 3 files changed: 204 ins; 0 del; 1 mod 8273409: Receiver type narrowed by CCP does not always trigger post-parse call devirtualization Reviewed-by: vlivanov, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/5418 From thartmann at openjdk.java.net Thu Sep 9 13:48:09 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 9 Sep 2021 13:48:09 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v3] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Thu, 9 Sep 2021 13:25:38 GMT, Zhengyu Gu wrote: >> The transformation reduce instructions in generated code. >> >> ### x86_64: >> >> Before: >> ``` >> 0x00007fb92c78b3ac: neg %esi >> 0x00007fb92c78b3ae: neg %edx >> 0x00007fb92c78b3b0: mov %esi,%eax >> 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> After: >> >> ; - TestSub::runSub at -1 (line 9) >> 0x00007fc8c05b74ac: mov %esi,%eax >> 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> >> ### AArch64: >> Before: >> >> 0x0000ffff814b4a70: neg w11, w1 >> 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> After: >> >> 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) > > Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Spacing > - Merge branch 'master' into JDK-8273454-neg-mul > - @theRealELiu and @TobiHartmann's comments > - Fix test > - Merge branch 'master' into JDK-8273454-neg-mul > - v1 > - v0 Changes requested by thartmann (Reviewer). src/hotspot/share/opto/mulnode.cpp line 74: > 72: if (phase->type(n11)->is_zero_type() && > 73: phase->type(n21)->is_zero_type()) { > 74: return make(in1->in(2), in2->in(2)); Why do you need to create a new node? Can't you simply update the inputs like the code below does? ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From thartmann at openjdk.java.net Thu Sep 9 13:58:07 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 9 Sep 2021 13:58:07 GMT Subject: Integrated: 8273498: compiler/c2/Test7179138_1.java timed out In-Reply-To: <7aVtxqcslVMM50Jg-7jfYJD9sJIamcjY_orMrpTnZh0=.51629a47-dbc6-4d61-a700-17b69feda425@github.com> References: <7aVtxqcslVMM50Jg-7jfYJD9sJIamcjY_orMrpTnZh0=.51629a47-dbc6-4d61-a700-17b69feda425@github.com> Message-ID: <7U1dFvTQ_v5PQHiMxp2Cic3e2kx8q3FWs_qP3wcizhU=.9b3da438-d40b-41c9-a7c9-9e5bd1127ba3@github.com> On Thu, 9 Sep 2021 06:11:34 GMT, Tobias Hartmann wrote: > The test times out with `-Xcomp` because it sets `-XX:RepeatCompilation=100` to give `-XX:+StressIGVN` a chance to reproduce the issue. I propose to simply remove the `RepeatCompilation`flag, given that tests are usually executed many times. That's also consistent with how we do it with other tests that use stress flags (for example, `TestDeadLoopSplitIfLoop.java`). > > Thanks, > Tobias This pull request has now been integrated. Changeset: c81690d7 Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/c81690d7166c32caff6ef3a55fe9b157049e2b56 Stats: 3 lines in 2 files changed: 0 ins; 2 del; 1 mod 8273498: compiler/c2/Test7179138_1.java timed out Reviewed-by: roland ------------- PR: https://git.openjdk.java.net/jdk/pull/5431 From zgu at openjdk.java.net Thu Sep 9 15:02:32 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 9 Sep 2021 15:02:32 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v4] In-Reply-To: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: > The transformation reduce instructions in generated code. > > ### x86_64: > > Before: > ``` > 0x00007fb92c78b3ac: neg %esi > 0x00007fb92c78b3ae: neg %edx > 0x00007fb92c78b3b0: mov %esi,%eax > 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > After: > > ; - TestSub::runSub at -1 (line 9) > 0x00007fc8c05b74ac: mov %esi,%eax > 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > > > ### AArch64: > Before: > > 0x0000ffff814b4a70: neg w11, w1 > 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > > After: > > 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Fix node in place instead of creating new node ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5403/files - new: https://git.openjdk.java.net/jdk/pull/5403/files/8f7f241f..71aa6ac4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5403&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5403&range=02-03 Stats: 14 lines in 2 files changed: 7 ins; 6 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5403.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5403/head:pull/5403 PR: https://git.openjdk.java.net/jdk/pull/5403 From zgu at openjdk.java.net Thu Sep 9 15:02:37 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 9 Sep 2021 15:02:37 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v3] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Thu, 9 Sep 2021 13:45:24 GMT, Tobias Hartmann wrote: >> Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Spacing >> - Merge branch 'master' into JDK-8273454-neg-mul >> - @theRealELiu and @TobiHartmann's comments >> - Fix test >> - Merge branch 'master' into JDK-8273454-neg-mul >> - v1 >> - v0 > > src/hotspot/share/opto/mulnode.cpp line 74: > >> 72: if (phase->type(n11)->is_zero_type() && >> 73: phase->type(n21)->is_zero_type()) { >> 74: return make(in1->in(2), in2->in(2)); > > Why do you need to create a new node? Can't you simply update the inputs like the code below does? Sorry, I missed your early comment. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From github.com+71546117+tobiasholenstein at openjdk.java.net Thu Sep 9 15:23:12 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Thu, 9 Sep 2021 15:23:12 GMT Subject: RFR: JDK-8272698: LoadNode::pin is unused Message-ID: Removed dead code and tested on Tier 1. Thanks, Tobias ------------- Commit messages: - JDK-8272698: LoadNode::pin is unused Changes: https://git.openjdk.java.net/jdk/pull/5408/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5408&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272698 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5408.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5408/head:pull/5408 PR: https://git.openjdk.java.net/jdk/pull/5408 From roland at openjdk.java.net Thu Sep 9 15:29:08 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 9 Sep 2021 15:29:08 GMT Subject: RFR: JDK-8272698: LoadNode::pin is unused In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 08:46:02 GMT, Tobias Holenstein wrote: > Removed dead code and tested on Tier 1. > > Thanks, > Tobias Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5408 From github.com+71546117+tobiasholenstein at openjdk.java.net Thu Sep 9 15:23:16 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Thu, 9 Sep 2021 15:23:16 GMT Subject: RFR: JDK-8264517: C2: make MachCallNode::return_value_is_used() only available for x86 Message-ID: Added #ifndef _LP64 guard to the MachCallNode::return_value_is_used() method in src/hotspot/share/opto/machnode.* because it is only used for 32-bit x86. Thanks, Tobias ------------- Commit messages: - JDK-8272771: removed dead returns_long() - JDK-8264517: C2: make MachCallNode::return_value_is_used() only available for x86 Changes: https://git.openjdk.java.net/jdk/pull/5435/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5435&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8264517 Stats: 4 lines in 2 files changed: 2 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5435.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5435/head:pull/5435 PR: https://git.openjdk.java.net/jdk/pull/5435 From sviswanathan at openjdk.java.net Thu Sep 9 17:38:23 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 9 Sep 2021 17:38:23 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files [v3] In-Reply-To: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: > Fix the copyright header of x86 macroAssembler files to match others. Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Implement review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5424/files - new: https://git.openjdk.java.net/jdk/pull/5424/files/4e7f94d7..9e1664e5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5424&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5424&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5424.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5424/head:pull/5424 PR: https://git.openjdk.java.net/jdk/pull/5424 From sviswanathan at openjdk.java.net Thu Sep 9 17:38:24 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 9 Sep 2021 17:38:24 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files [v2] In-Reply-To: References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: On Thu, 9 Sep 2021 06:17:22 GMT, Tobias Hartmann wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> implement review comments > > What about this one? > https://github.com/openjdk/jdk/blob/0417fcf13f7f2159499d325f2b3ace49d2643557/src/hotspot/cpu/aarch64/macroAssembler_aarch64_log.cpp#L2 > > Other files look good and consistent with the Intel copyright in `src/jdk.incubator.vector/linux/native/libsvml/*`. Thanks a lot @TobiHartmann. I have corrected that line as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/5424 From github.com+2249648+johntortugo at openjdk.java.net Thu Sep 9 20:53:06 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Thu, 9 Sep 2021 20:53:06 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 07:33:47 GMT, Tobias Hartmann wrote: >> Hi, again @chhagedorn. I added some `custom run tests` to tests that seemed more "complex". Please let me know if there are others that you think I should add. > > @JohnTortugo just FYI, @chhagedorn is currently on vacation but will be back mid next week. Thank you @TobiHartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5135 From kvn at openjdk.java.net Thu Sep 9 21:44:03 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 9 Sep 2021 21:44:03 GMT Subject: RFR: JDK-8264517: C2: make MachCallNode::return_value_is_used() only available for x86 In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 07:27:09 GMT, Tobias Holenstein wrote: > Added #ifndef _LP64 guard to the MachCallNode::return_value_is_used() method in src/hotspot/share/opto/machnode.* because it is only used for 32-bit x86. > > Thanks, > Tobias Good ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5435 From eliu at openjdk.java.net Fri Sep 10 01:13:05 2021 From: eliu at openjdk.java.net (Eric Liu) Date: Fri, 10 Sep 2021 01:13:05 GMT Subject: RFR: 8272968: AArch64: Remove redundant matching rules for commutative ops In-Reply-To: References: Message-ID: On Tue, 31 Aug 2021 06:10:10 GMT, Faye Gao wrote: > Match rules for commutative operations mnegI/mnegL/smnegL might > become redundant after function matchrule_clone_and_swap(), > and hence can be reduced. > > In adlc part, while parsing the contents of an instruction > definition, function instr_parse always do the check > for commutative operations with subtree operands, create > clones and swap operands by function matchrule_clone_and_swap. > It means that another operand-swapped and partially > symmetrical match rule should be generated automatically for > these commutative operations. > > The pattern to construct mnegI, mnegL or smnegL consists of > a subtraction with zero and then a multiplication. In function > count_commutative_op, both mulI and mulL are recognized as > commutative opcodes. Therefore, we need only one match rule > to specify that a multipilication consists of a number and > a subtraction with zero for these three instructions and the > extra one can be deleted. > > Take mnegL as an example. > > Without my patch, four match rules will be created finally for > instruction selection. > > Two of them are created by ad files: > > Match Rule 1: > dst = MulL (SubL zero src1) src2 > ===> > dst = mnegl src1 src2 > > Match Rule 2: > dst = MulL src1 (SubL zero src2) > ===> > dst = mnegl src1 src2 > > The other two are automatically generated by function > matchrule_clone_and_swap based on the two rules above: > > Match Rule 3 (generated by match rule 1): > dst = MulL src2 (SubL zero src1) > ===> > dst = mnegl src1 src2 > > Match Rule 4 (generated by match rule 2): > dst = MulL (SubL zero src2) src1 > ===> > dst = mnegl src1 src2 > > As mnegl is commutative, Rule 3 is equivalent to > Rule 2, and Rule 1 is equivalent to Rule 4. Also, if we only > keep the original Match Rule 1, as showed above, Rule 3 will > be generated automatically later. In this way, Rule 2 and Rule 4 > are redundant and hence Rule 2 can be eliminated. > > With my patch, Rule 2 is removed and Rule 4 won't be generated as well. > Only Rule 1 and 3 are kept in the final rule chain. In my local release > build, as redundant code got removed, the size of libjvm.so decreased > from 23.30 MB to 23.27 MB, with a reduction of 33.11 KB(around 0.14%). LGTM ------------- PR: https://git.openjdk.java.net/jdk/pull/5311 From yyang at openjdk.java.net Fri Sep 10 02:27:38 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Fri, 10 Sep 2021 02:27:38 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v4] In-Reply-To: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: > Greetings. This patch adds the following identical equations for Add and Xor node, respectively, which probably drives further optimizations. > > > ~(x-1) => -x > ~x + 1 => -x > > > > Verified by generated opto assembly, maybe an IR verification test can be added later. > > Compiled method (c2) 71 1 compiler.c2.TestAddXorIdeal::test1 (6 bytes) > 0x00007f9e11514800: sub $0x18,%rsp > 0x00007f9e11514807: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test1 at -1 (line 39) > 0x00007f9e1151480c: mov %esi,%eax > 0x00007f9e1151480e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test1 at 4 (line 39) > 0x00007f9e11514810: add $0x10,%rsp > 0x00007f9e11514814: pop %rbp > 0x00007f9e11514815: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151481c: ja 0x00007f9e11514823 > 0x00007f9e11514822: retq > > Compiled method (c2) 73 2 compiler.c2.TestAddXorIdeal::test2 (6 bytes) > 0x00007f9e11512480: sub $0x18,%rsp > 0x00007f9e11512487: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test2 at -1 (line 43) > 0x00007f9e1151248c: mov %esi,%eax > 0x00007f9e1151248e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test2 at 4 (line 43) > 0x00007f9e11512490: add $0x10,%rsp > 0x00007f9e11512494: pop %rbp > 0x00007f9e11512495: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151249c: ja 0x00007f9e115124a3 > 0x00007f9e115124a2: retq > > Compiled method (c2) 72 3 compiler.c2.TestAddXorIdeal::test3 (8 bytes) > 0x00007f9e11514b00: sub $0x18,%rsp > 0x00007f9e11514b07: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test3 at -1 (line 47) > 0x00007f9e11514b0c: mov %rsi,%rax > 0x00007f9e11514b0f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test3 at 6 (line 47) > 0x00007f9e11514b12: add $0x10,%rsp > 0x00007f9e11514b16: pop %rbp > 0x00007f9e11514b17: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e11514b1e: ja 0x00007f9e11514b25 > 0x00007f9e11514b24: retq > > Compiled method (c2) 72 4 compiler.c2.TestAddXorIdeal::test4 (8 bytes) > 0x00007f9e11514500: sub $0x18,%rsp > 0x00007f9e11514507: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test4 at -1 (line 51) > 0x00007f9e1151450c: mov %rsi,%rax > 0x00007f9e1151450f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test4 at 6 (line 51) > 0x00007f9e11514512: add $0x10,%rsp > 0x00007f9e11514516: pop %rbp > 0x00007f9e11514517: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151451e: ja 0x00007f9e11514525 > 0x00007f9e11514524: retq > > Compiled method (c2) 72 5 compiler.c2.TestAddXorIdeal::test5 (6 bytes) > 0x00007f9e11518500: sub $0x18,%rsp > 0x00007f9e11518507: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test5 at -1 (line 55) > 0x00007f9e1151850c: mov %esi,%eax > 0x00007f9e1151850e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test5 at 4 (line 55) > 0x00007f9e11518510: add $0x10,%rsp > 0x00007f9e11518514: pop %rbp > 0x00007f9e11518515: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151851c: ja 0x00007f9e11518523 > 0x00007f9e11518522: retq > > Compiled method (c2) 74 6 compiler.c2.TestAddXorIdeal::test6 (6 bytes) > 0x00007f9e11512180: sub $0x18,%rsp > 0x00007f9e11512187: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test6 at -1 (line 59) > 0x00007f9e1151218c: mov %esi,%eax > 0x00007f9e1151218e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test6 at 4 (line 59) > 0x00007f9e11512190: add $0x10,%rsp > 0x00007f9e11512194: pop %rbp > 0x00007f9e11512195: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151219c: ja 0x00007f9e115121a3 > 0x00007f9e115121a2: retq > > Compiled method (c2) 74 7 compiler.c2.TestAddXorIdeal::test7 (8 bytes) > 0x00007f9e11511e80: sub $0x18,%rsp > 0x00007f9e11511e87: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test7 at -1 (line 63) > 0x00007f9e11511e8c: mov %rsi,%rax > 0x00007f9e11511e8f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test7 at 6 (line 63) > 0x00007f9e11511e92: add $0x10,%rsp > 0x00007f9e11511e96: pop %rbp > 0x00007f9e11511e97: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e11511e9e: ja 0x00007f9e11511ea5 > 0x00007f9e11511ea4: retq > > > Compiled method (c2) 73 8 compiler.c2.TestAddXorIdeal::test8 (10 bytes) > 0x00007f9e11512780: sub $0x18,%rsp > 0x00007f9e11512787: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test8 at -1 (line 67) > 0x00007f9e1151278c: mov %rsi,%rax > 0x00007f9e1151278f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test8 at 8 (line 67) > 0x00007f9e11512792: add $0x10,%rsp > 0x00007f9e11512796: pop %rbp > 0x00007f9e11512797: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151279e: ja 0x00007f9e115127a5 > 0x00007f9e115127a4: retq Yi Yang has updated the pull request incrementally with one additional commit since the last revision: review from Tobias ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5266/files - new: https://git.openjdk.java.net/jdk/pull/5266/files/e8a930b7..aee3a8b0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5266&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5266&range=02-03 Stats: 16 lines in 1 file changed: 4 ins; 2 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/5266.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5266/head:pull/5266 PR: https://git.openjdk.java.net/jdk/pull/5266 From yyang at openjdk.java.net Fri Sep 10 02:27:42 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Fri, 10 Sep 2021 02:27:42 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v3] In-Reply-To: References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: <_FycSeoxadQzhabRtXwx4047Uk6LfIZctQHzYdtuK0c=.6cd67694-69dc-4678-9541-7d68dcd4cbe1@github.com> On Thu, 9 Sep 2021 06:24:23 GMT, Tobias Hartmann wrote: >> Yi Yang has updated the pull request incrementally with two additional commits since the last revision: >> >> - dontinline >> - more random > > test/hotspot/jtreg/compiler/c2/TestAddXorIdeal.java line 75: > >> 73: >> 74: public static void main(String... args) { >> 75: Random random = new Random(); > > You should use `Utils.getRandomInstance()` from `import jdk.test.lib.Utils` to ensure that the seed is printed for reproducibility. You can check other tests for an example. That does make sense. Changed. > test/hotspot/jtreg/compiler/c2/TestAddXorIdeal.java line 77: > >> 75: Random random = new Random(); >> 76: int n = 0; >> 77: long n1 = 0; > > Should be declared in the loop. Do you mean declared within loop body? I've changed but it looks like a perference problem. ------------- PR: https://git.openjdk.java.net/jdk/pull/5266 From eliu at openjdk.java.net Fri Sep 10 02:59:02 2021 From: eliu at openjdk.java.net (Eric Liu) Date: Fri, 10 Sep 2021 02:59:02 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v4] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: <-EYsHDdFMKiY7SVbmhnVJib76SspeVEaziPrs0OESVY=.e9b3a07e-5606-4fc6-8652-5e457476eeff@github.com> On Thu, 9 Sep 2021 15:02:32 GMT, Zhengyu Gu wrote: >> The transformation reduce instructions in generated code. >> >> ### x86_64: >> >> Before: >> ``` >> 0x00007fb92c78b3ac: neg %esi >> 0x00007fb92c78b3ae: neg %edx >> 0x00007fb92c78b3b0: mov %esi,%eax >> 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> After: >> >> ; - TestSub::runSub at -1 (line 9) >> 0x00007fc8c05b74ac: mov %esi,%eax >> 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> >> ### AArch64: >> Before: >> >> 0x0000ffff814b4a70: neg w11, w1 >> 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> After: >> >> 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Fix node in place instead of creating new node test/hotspot/jtreg/compiler/integerArithmetic/TestNegMultiply.java line 44: > 42: > 43: private static void testInt(int a, int b) { > 44: int expected = (-a) * (-b); Are you sure about this is the expected value? As the method has been invoked 2000 times, I think it would be compiled by c2. test/hotspot/jtreg/compiler/integerArithmetic/TestNegMultiply.java line 63: > 61: } > 62: } > 63: } How about calculating the expected value outside the iteration to avoid it to be compiled? ``` java private static void testLong() { for (int i = 0; i < 20_000; i++) { long a = random.nextLong(); long b = random.nextLong(); long expected = (-a) * (-b); if (expected != test(a, b)) { throw new RuntimeException("Incorrect result."); } } } ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From duke at openjdk.java.net Fri Sep 10 04:34:04 2021 From: duke at openjdk.java.net (duke) Date: Fri, 10 Sep 2021 04:34:04 GMT Subject: Withdrawn: 8270446: Remove the MemBarNode that has been optimized from the current block In-Reply-To: <4PeDwwxv63XQd4JggltcKVfxK-t5bK-bx1VWnGNXxTA=.bd808948-8fc7-4973-95fb-f26aa422b979@github.com> References: <4PeDwwxv63XQd4JggltcKVfxK-t5bK-bx1VWnGNXxTA=.bd808948-8fc7-4973-95fb-f26aa422b979@github.com> Message-ID: On Wed, 14 Jul 2021 07:35:25 GMT, SUN Guoyun wrote: > Hi all, > aarch64 and mips64 also had an optimization to merge two memory barrier instructions. But when I use the args -XX:+UnlockDiagnosticVMOptions -XX:+PrintOptoAssembly, I see some information like the following > > 1a0 membar_release > dmb ish > 1a4 membar_release > dmb ish > 1a4 spill R19 -> R0 # spill size = 64 > > > Here, "1a4 membar_release" is actually optimized out . so I think it should be better to display it like this > > > 1a0 membar_release > dmb ish > 1a4 spill R19 -> R0 # spill size = 64 > > Please review this trivial change. > > Thanks, > Sun Guoyun This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/4774 From thartmann at openjdk.java.net Fri Sep 10 05:57:07 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 10 Sep 2021 05:57:07 GMT Subject: RFR: JDK-8272698: LoadNode::pin is unused In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 08:46:02 GMT, Tobias Holenstein wrote: > Removed dead code and tested on Tier 1. > > Thanks, > Tobias Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5408 From thartmann at openjdk.java.net Fri Sep 10 05:57:07 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 10 Sep 2021 05:57:07 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files [v3] In-Reply-To: References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: On Thu, 9 Sep 2021 17:38:23 GMT, Sandhya Viswanathan wrote: >> Fix the copyright header of x86 macroAssembler files to match others. > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Implement review comments Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5424 From thartmann at openjdk.java.net Fri Sep 10 05:59:00 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 10 Sep 2021 05:59:00 GMT Subject: RFR: JDK-8264517: C2: make MachCallNode::return_value_is_used() only available for x86 In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 07:27:09 GMT, Tobias Holenstein wrote: > Added #ifndef _LP64 guard to the MachCallNode::return_value_is_used() method in src/hotspot/share/opto/machnode.* because it is only used for 32-bit x86. > > Thanks, > Tobias Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5435 From thartmann at openjdk.java.net Fri Sep 10 06:18:06 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 10 Sep 2021 06:18:06 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v4] In-Reply-To: References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: On Fri, 10 Sep 2021 02:27:38 GMT, Yi Yang wrote: >> Greetings. This patch adds the following identical equations for Add and Xor node, respectively, which probably drives further optimizations. >> >> >> ~(x-1) => -x >> ~x + 1 => -x >> >> >> >> Verified by generated opto assembly, maybe an IR verification test can be added later. >> >> Compiled method (c2) 71 1 compiler.c2.TestAddXorIdeal::test1 (6 bytes) >> 0x00007f9e11514800: sub $0x18,%rsp >> 0x00007f9e11514807: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test1 at -1 (line 39) >> 0x00007f9e1151480c: mov %esi,%eax >> 0x00007f9e1151480e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test1 at 4 (line 39) >> 0x00007f9e11514810: add $0x10,%rsp >> 0x00007f9e11514814: pop %rbp >> 0x00007f9e11514815: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151481c: ja 0x00007f9e11514823 >> 0x00007f9e11514822: retq >> >> Compiled method (c2) 73 2 compiler.c2.TestAddXorIdeal::test2 (6 bytes) >> 0x00007f9e11512480: sub $0x18,%rsp >> 0x00007f9e11512487: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test2 at -1 (line 43) >> 0x00007f9e1151248c: mov %esi,%eax >> 0x00007f9e1151248e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test2 at 4 (line 43) >> 0x00007f9e11512490: add $0x10,%rsp >> 0x00007f9e11512494: pop %rbp >> 0x00007f9e11512495: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151249c: ja 0x00007f9e115124a3 >> 0x00007f9e115124a2: retq >> >> Compiled method (c2) 72 3 compiler.c2.TestAddXorIdeal::test3 (8 bytes) >> 0x00007f9e11514b00: sub $0x18,%rsp >> 0x00007f9e11514b07: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test3 at -1 (line 47) >> 0x00007f9e11514b0c: mov %rsi,%rax >> 0x00007f9e11514b0f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test3 at 6 (line 47) >> 0x00007f9e11514b12: add $0x10,%rsp >> 0x00007f9e11514b16: pop %rbp >> 0x00007f9e11514b17: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e11514b1e: ja 0x00007f9e11514b25 >> 0x00007f9e11514b24: retq >> >> Compiled method (c2) 72 4 compiler.c2.TestAddXorIdeal::test4 (8 bytes) >> 0x00007f9e11514500: sub $0x18,%rsp >> 0x00007f9e11514507: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test4 at -1 (line 51) >> 0x00007f9e1151450c: mov %rsi,%rax >> 0x00007f9e1151450f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test4 at 6 (line 51) >> 0x00007f9e11514512: add $0x10,%rsp >> 0x00007f9e11514516: pop %rbp >> 0x00007f9e11514517: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151451e: ja 0x00007f9e11514525 >> 0x00007f9e11514524: retq >> >> Compiled method (c2) 72 5 compiler.c2.TestAddXorIdeal::test5 (6 bytes) >> 0x00007f9e11518500: sub $0x18,%rsp >> 0x00007f9e11518507: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test5 at -1 (line 55) >> 0x00007f9e1151850c: mov %esi,%eax >> 0x00007f9e1151850e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test5 at 4 (line 55) >> 0x00007f9e11518510: add $0x10,%rsp >> 0x00007f9e11518514: pop %rbp >> 0x00007f9e11518515: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151851c: ja 0x00007f9e11518523 >> 0x00007f9e11518522: retq >> >> Compiled method (c2) 74 6 compiler.c2.TestAddXorIdeal::test6 (6 bytes) >> 0x00007f9e11512180: sub $0x18,%rsp >> 0x00007f9e11512187: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test6 at -1 (line 59) >> 0x00007f9e1151218c: mov %esi,%eax >> 0x00007f9e1151218e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test6 at 4 (line 59) >> 0x00007f9e11512190: add $0x10,%rsp >> 0x00007f9e11512194: pop %rbp >> 0x00007f9e11512195: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151219c: ja 0x00007f9e115121a3 >> 0x00007f9e115121a2: retq >> >> Compiled method (c2) 74 7 compiler.c2.TestAddXorIdeal::test7 (8 bytes) >> 0x00007f9e11511e80: sub $0x18,%rsp >> 0x00007f9e11511e87: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test7 at -1 (line 63) >> 0x00007f9e11511e8c: mov %rsi,%rax >> 0x00007f9e11511e8f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test7 at 6 (line 63) >> 0x00007f9e11511e92: add $0x10,%rsp >> 0x00007f9e11511e96: pop %rbp >> 0x00007f9e11511e97: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e11511e9e: ja 0x00007f9e11511ea5 >> 0x00007f9e11511ea4: retq >> >> >> Compiled method (c2) 73 8 compiler.c2.TestAddXorIdeal::test8 (10 bytes) >> 0x00007f9e11512780: sub $0x18,%rsp >> 0x00007f9e11512787: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test8 at -1 (line 67) >> 0x00007f9e1151278c: mov %rsi,%rax >> 0x00007f9e1151278f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test8 at 8 (line 67) >> 0x00007f9e11512792: add $0x10,%rsp >> 0x00007f9e11512796: pop %rbp >> 0x00007f9e11512797: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151279e: ja 0x00007f9e115127a5 >> 0x00007f9e115127a4: retq > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > review from Tobias Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5266 From thartmann at openjdk.java.net Fri Sep 10 06:18:06 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 10 Sep 2021 06:18:06 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v3] In-Reply-To: <_FycSeoxadQzhabRtXwx4047Uk6LfIZctQHzYdtuK0c=.6cd67694-69dc-4678-9541-7d68dcd4cbe1@github.com> References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> <_FycSeoxadQzhabRtXwx4047Uk6LfIZctQHzYdtuK0c=.6cd67694-69dc-4678-9541-7d68dcd4cbe1@github.com> Message-ID: <_TA7_ZELB_i87MXd-MOomaOlYB6CHH8BGYM37jDUN9Y=.976b7c7e-2db5-4215-b7e2-d4eb8aeca780@github.com> On Fri, 10 Sep 2021 02:21:59 GMT, Yi Yang wrote: >> test/hotspot/jtreg/compiler/c2/TestAddXorIdeal.java line 77: >> >>> 75: Random random = new Random(); >>> 76: int n = 0; >>> 77: long n1 = 0; >> >> Should be declared in the loop. > > Do you mean declared within loop body? I've changed but it looks like a perference problem. Yes, in Java, local variables should be declared as close as possible to the point they are first used at (see, for example, [Google's Java Style Guide](https://google.github.io/styleguide/javaguide.html#s4.8.2-variable-declarations)). The declaration does not affect performance. Here's how I would write the loop to improve readability: for (int j = 0; j < 50_000; j++) { int i = random.nextInt(); long l = random.nextLong(); Asserts.assertTrue(test1(i) == -i); Asserts.assertTrue(test2(i) == -i); Asserts.assertTrue(test3(l) == -l); ... Summary: No need to use negative initial value for loop induction variable (as it is not even used), increase number of iterations to ensure C2 compilation (you are running without `-Xbatch`), use same random numbers per loop iteration, use more intuitive variable names. But these are just code style details, feel free to ignore. ------------- PR: https://git.openjdk.java.net/jdk/pull/5266 From yyang at openjdk.java.net Fri Sep 10 07:36:00 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Fri, 10 Sep 2021 07:36:00 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v3] In-Reply-To: <_TA7_ZELB_i87MXd-MOomaOlYB6CHH8BGYM37jDUN9Y=.976b7c7e-2db5-4215-b7e2-d4eb8aeca780@github.com> References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> <_FycSeoxadQzhabRtXwx4047Uk6LfIZctQHzYdtuK0c=.6cd67694-69dc-4678-9541-7d68dcd4cbe1@github.com> <_TA7_ZELB_i87MXd-MOomaOlYB6CHH8BGYM37jDUN9Y=.976b7c7e-2db5-4215-b7e2-d4eb8aeca780@github.com> Message-ID: On Fri, 10 Sep 2021 06:14:57 GMT, Tobias Hartmann wrote: >> Do you mean declared within loop body? I've changed but it looks like a perference problem. > > Yes, in Java, local variables should be declared as close as possible to the point they are first used at (see, for example, [Google's Java Style Guide](https://google.github.io/styleguide/javaguide.html#s4.8.2-variable-declarations)). The declaration does not affect performance. > > Here's how I would write the loop to improve readability: > > for (int j = 0; j < 50_000; j++) { > int i = random.nextInt(); > long l = random.nextLong(); > Asserts.assertTrue(test1(i) == -i); > Asserts.assertTrue(test2(i) == -i); > Asserts.assertTrue(test3(l) == -l); > ... > > Summary: No need to use negative initial value for loop induction variable (as it is not even used), increase number of iterations to ensure C2 compilation (you are running without `-Xbatch`), use same random numbers per loop iteration, use more intuitive variable names. > > But these are just code style details, feel free to ignore. Thanks for patient. I agree with your comment that using 0 as initial value and increasing iterations. :) I try to use `a` and `b` as variable names since `long l` looks like `long 1` ------------- PR: https://git.openjdk.java.net/jdk/pull/5266 From yyang at openjdk.java.net Fri Sep 10 07:41:36 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Fri, 10 Sep 2021 07:41:36 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v5] In-Reply-To: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: > Greetings. This patch adds the following identical equations for Add and Xor node, respectively, which probably drives further optimizations. > > > ~(x-1) => -x > ~x + 1 => -x > > > > Verified by generated opto assembly, maybe an IR verification test can be added later. > > Compiled method (c2) 71 1 compiler.c2.TestAddXorIdeal::test1 (6 bytes) > 0x00007f9e11514800: sub $0x18,%rsp > 0x00007f9e11514807: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test1 at -1 (line 39) > 0x00007f9e1151480c: mov %esi,%eax > 0x00007f9e1151480e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test1 at 4 (line 39) > 0x00007f9e11514810: add $0x10,%rsp > 0x00007f9e11514814: pop %rbp > 0x00007f9e11514815: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151481c: ja 0x00007f9e11514823 > 0x00007f9e11514822: retq > > Compiled method (c2) 73 2 compiler.c2.TestAddXorIdeal::test2 (6 bytes) > 0x00007f9e11512480: sub $0x18,%rsp > 0x00007f9e11512487: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test2 at -1 (line 43) > 0x00007f9e1151248c: mov %esi,%eax > 0x00007f9e1151248e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test2 at 4 (line 43) > 0x00007f9e11512490: add $0x10,%rsp > 0x00007f9e11512494: pop %rbp > 0x00007f9e11512495: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151249c: ja 0x00007f9e115124a3 > 0x00007f9e115124a2: retq > > Compiled method (c2) 72 3 compiler.c2.TestAddXorIdeal::test3 (8 bytes) > 0x00007f9e11514b00: sub $0x18,%rsp > 0x00007f9e11514b07: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test3 at -1 (line 47) > 0x00007f9e11514b0c: mov %rsi,%rax > 0x00007f9e11514b0f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test3 at 6 (line 47) > 0x00007f9e11514b12: add $0x10,%rsp > 0x00007f9e11514b16: pop %rbp > 0x00007f9e11514b17: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e11514b1e: ja 0x00007f9e11514b25 > 0x00007f9e11514b24: retq > > Compiled method (c2) 72 4 compiler.c2.TestAddXorIdeal::test4 (8 bytes) > 0x00007f9e11514500: sub $0x18,%rsp > 0x00007f9e11514507: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test4 at -1 (line 51) > 0x00007f9e1151450c: mov %rsi,%rax > 0x00007f9e1151450f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test4 at 6 (line 51) > 0x00007f9e11514512: add $0x10,%rsp > 0x00007f9e11514516: pop %rbp > 0x00007f9e11514517: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151451e: ja 0x00007f9e11514525 > 0x00007f9e11514524: retq > > Compiled method (c2) 72 5 compiler.c2.TestAddXorIdeal::test5 (6 bytes) > 0x00007f9e11518500: sub $0x18,%rsp > 0x00007f9e11518507: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test5 at -1 (line 55) > 0x00007f9e1151850c: mov %esi,%eax > 0x00007f9e1151850e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test5 at 4 (line 55) > 0x00007f9e11518510: add $0x10,%rsp > 0x00007f9e11518514: pop %rbp > 0x00007f9e11518515: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151851c: ja 0x00007f9e11518523 > 0x00007f9e11518522: retq > > Compiled method (c2) 74 6 compiler.c2.TestAddXorIdeal::test6 (6 bytes) > 0x00007f9e11512180: sub $0x18,%rsp > 0x00007f9e11512187: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test6 at -1 (line 59) > 0x00007f9e1151218c: mov %esi,%eax > 0x00007f9e1151218e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test6 at 4 (line 59) > 0x00007f9e11512190: add $0x10,%rsp > 0x00007f9e11512194: pop %rbp > 0x00007f9e11512195: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151219c: ja 0x00007f9e115121a3 > 0x00007f9e115121a2: retq > > Compiled method (c2) 74 7 compiler.c2.TestAddXorIdeal::test7 (8 bytes) > 0x00007f9e11511e80: sub $0x18,%rsp > 0x00007f9e11511e87: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test7 at -1 (line 63) > 0x00007f9e11511e8c: mov %rsi,%rax > 0x00007f9e11511e8f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test7 at 6 (line 63) > 0x00007f9e11511e92: add $0x10,%rsp > 0x00007f9e11511e96: pop %rbp > 0x00007f9e11511e97: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e11511e9e: ja 0x00007f9e11511ea5 > 0x00007f9e11511ea4: retq > > > Compiled method (c2) 73 8 compiler.c2.TestAddXorIdeal::test8 (10 bytes) > 0x00007f9e11512780: sub $0x18,%rsp > 0x00007f9e11512787: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test8 at -1 (line 67) > 0x00007f9e1151278c: mov %rsi,%rax > 0x00007f9e1151278f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test8 at 8 (line 67) > 0x00007f9e11512792: add $0x10,%rsp > 0x00007f9e11512796: pop %rbp > 0x00007f9e11512797: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151279e: ja 0x00007f9e115127a5 > 0x00007f9e115127a4: retq Yi Yang has updated the pull request incrementally with one additional commit since the last revision: change ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5266/files - new: https://git.openjdk.java.net/jdk/pull/5266/files/aee3a8b0..cceb0613 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5266&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5266&range=03-04 Stats: 19 lines in 1 file changed: 0 ins; 8 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/5266.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5266/head:pull/5266 PR: https://git.openjdk.java.net/jdk/pull/5266 From github.com+71546117+tobiasholenstein at openjdk.java.net Fri Sep 10 07:46:05 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Fri, 10 Sep 2021 07:46:05 GMT Subject: Integrated: JDK-8264517: C2: make MachCallNode::return_value_is_used() only available for x86 In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 07:27:09 GMT, Tobias Holenstein wrote: > Added #ifndef _LP64 guard to the MachCallNode::return_value_is_used() method in src/hotspot/share/opto/machnode.* because it is only used for 32-bit x86. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 792281d5 Author: Tobias Holenstein Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/792281d559ca1f01493775fdfc2a6ed09b3b883d Stats: 4 lines in 2 files changed: 2 ins; 1 del; 1 mod 8264517: C2: make MachCallNode::return_value_is_used() only available for x86 Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5435 From github.com+71546117+tobiasholenstein at openjdk.java.net Fri Sep 10 07:47:07 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Fri, 10 Sep 2021 07:47:07 GMT Subject: Integrated: JDK-8272698: LoadNode::pin is unused In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 08:46:02 GMT, Tobias Holenstein wrote: > Removed dead code and tested on Tier 1. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 2eaf374c Author: Tobias Holenstein Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/2eaf374c5ce08287311cfac8145f97bf839365a7 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8272698: LoadNode::pin is unused Reviewed-by: roland, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5408 From thartmann at openjdk.java.net Fri Sep 10 07:48:03 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 10 Sep 2021 07:48:03 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v5] In-Reply-To: References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: On Fri, 10 Sep 2021 07:41:36 GMT, Yi Yang wrote: >> Greetings. This patch adds the following identical equations for Add and Xor node, respectively, which probably drives further optimizations. >> >> >> ~(x-1) => -x >> ~x + 1 => -x >> >> >> >> Verified by generated opto assembly, maybe an IR verification test can be added later. >> >> Compiled method (c2) 71 1 compiler.c2.TestAddXorIdeal::test1 (6 bytes) >> 0x00007f9e11514800: sub $0x18,%rsp >> 0x00007f9e11514807: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test1 at -1 (line 39) >> 0x00007f9e1151480c: mov %esi,%eax >> 0x00007f9e1151480e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test1 at 4 (line 39) >> 0x00007f9e11514810: add $0x10,%rsp >> 0x00007f9e11514814: pop %rbp >> 0x00007f9e11514815: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151481c: ja 0x00007f9e11514823 >> 0x00007f9e11514822: retq >> >> Compiled method (c2) 73 2 compiler.c2.TestAddXorIdeal::test2 (6 bytes) >> 0x00007f9e11512480: sub $0x18,%rsp >> 0x00007f9e11512487: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test2 at -1 (line 43) >> 0x00007f9e1151248c: mov %esi,%eax >> 0x00007f9e1151248e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test2 at 4 (line 43) >> 0x00007f9e11512490: add $0x10,%rsp >> 0x00007f9e11512494: pop %rbp >> 0x00007f9e11512495: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151249c: ja 0x00007f9e115124a3 >> 0x00007f9e115124a2: retq >> >> Compiled method (c2) 72 3 compiler.c2.TestAddXorIdeal::test3 (8 bytes) >> 0x00007f9e11514b00: sub $0x18,%rsp >> 0x00007f9e11514b07: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test3 at -1 (line 47) >> 0x00007f9e11514b0c: mov %rsi,%rax >> 0x00007f9e11514b0f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test3 at 6 (line 47) >> 0x00007f9e11514b12: add $0x10,%rsp >> 0x00007f9e11514b16: pop %rbp >> 0x00007f9e11514b17: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e11514b1e: ja 0x00007f9e11514b25 >> 0x00007f9e11514b24: retq >> >> Compiled method (c2) 72 4 compiler.c2.TestAddXorIdeal::test4 (8 bytes) >> 0x00007f9e11514500: sub $0x18,%rsp >> 0x00007f9e11514507: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test4 at -1 (line 51) >> 0x00007f9e1151450c: mov %rsi,%rax >> 0x00007f9e1151450f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test4 at 6 (line 51) >> 0x00007f9e11514512: add $0x10,%rsp >> 0x00007f9e11514516: pop %rbp >> 0x00007f9e11514517: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151451e: ja 0x00007f9e11514525 >> 0x00007f9e11514524: retq >> >> Compiled method (c2) 72 5 compiler.c2.TestAddXorIdeal::test5 (6 bytes) >> 0x00007f9e11518500: sub $0x18,%rsp >> 0x00007f9e11518507: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test5 at -1 (line 55) >> 0x00007f9e1151850c: mov %esi,%eax >> 0x00007f9e1151850e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test5 at 4 (line 55) >> 0x00007f9e11518510: add $0x10,%rsp >> 0x00007f9e11518514: pop %rbp >> 0x00007f9e11518515: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151851c: ja 0x00007f9e11518523 >> 0x00007f9e11518522: retq >> >> Compiled method (c2) 74 6 compiler.c2.TestAddXorIdeal::test6 (6 bytes) >> 0x00007f9e11512180: sub $0x18,%rsp >> 0x00007f9e11512187: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test6 at -1 (line 59) >> 0x00007f9e1151218c: mov %esi,%eax >> 0x00007f9e1151218e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test6 at 4 (line 59) >> 0x00007f9e11512190: add $0x10,%rsp >> 0x00007f9e11512194: pop %rbp >> 0x00007f9e11512195: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151219c: ja 0x00007f9e115121a3 >> 0x00007f9e115121a2: retq >> >> Compiled method (c2) 74 7 compiler.c2.TestAddXorIdeal::test7 (8 bytes) >> 0x00007f9e11511e80: sub $0x18,%rsp >> 0x00007f9e11511e87: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test7 at -1 (line 63) >> 0x00007f9e11511e8c: mov %rsi,%rax >> 0x00007f9e11511e8f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test7 at 6 (line 63) >> 0x00007f9e11511e92: add $0x10,%rsp >> 0x00007f9e11511e96: pop %rbp >> 0x00007f9e11511e97: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e11511e9e: ja 0x00007f9e11511ea5 >> 0x00007f9e11511ea4: retq >> >> >> Compiled method (c2) 73 8 compiler.c2.TestAddXorIdeal::test8 (10 bytes) >> 0x00007f9e11512780: sub $0x18,%rsp >> 0x00007f9e11512787: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test8 at -1 (line 67) >> 0x00007f9e1151278c: mov %rsi,%rax >> 0x00007f9e1151278f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test8 at 8 (line 67) >> 0x00007f9e11512792: add $0x10,%rsp >> 0x00007f9e11512796: pop %rbp >> 0x00007f9e11512797: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151279e: ja 0x00007f9e115127a5 >> 0x00007f9e115127a4: retq > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > change Thanks for making these changes. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5266 From yyang at openjdk.java.net Fri Sep 10 08:55:08 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Fri, 10 Sep 2021 08:55:08 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v5] In-Reply-To: References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: On Fri, 10 Sep 2021 07:45:01 GMT, Tobias Hartmann wrote: >> Yi Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> change > > Thanks for making these changes. Looks good. Thanks @TobiHartmann and @theRealELiu for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/5266 From kcr at openjdk.java.net Fri Sep 10 10:49:24 2021 From: kcr at openjdk.java.net (Kevin Rushforth) Date: Fri, 10 Sep 2021 10:49:24 GMT Subject: [jdk17] RFR: 8269820: C2 PhaseIdealLoop::do_unroll get wrong opaque node In-Reply-To: References: Message-ID: On Mon, 19 Jul 2021 14:13:31 GMT, Hui Shi wrote: > More discussion in previous PR(https://github.com/openjdk/jdk17/pull/208) is confilict with fix for JDK-8269752. > > Opaque1 in main loop entry is expected only used in compare node's second input. However split if might clone opaque1 and replace original node with phi of opaque1 node. This causes assertion in CountedLoopNode::is_canonical_loop_entry now. > > Fix is in BoolNode::Ideal, avoiding switching compare node's input when second input is phi of opaque1. > > Test: Linux X64 tier1/2/3 release/fastdebug no regression. The jdk17 repo is no longer open for any fixes. If this PR is to move forward, it will need to be redone as a PR for the mainline [openjdk/jdk](https://github.com/openjdk/jdk) repo. ------------- PR: https://git.openjdk.java.net/jdk17/pull/255 From kcr at openjdk.java.net Fri Sep 10 10:49:24 2021 From: kcr at openjdk.java.net (Kevin Rushforth) Date: Fri, 10 Sep 2021 10:49:24 GMT Subject: [jdk17] Withdrawn: 8269820: C2 PhaseIdealLoop::do_unroll get wrong opaque node In-Reply-To: References: Message-ID: On Mon, 19 Jul 2021 14:13:31 GMT, Hui Shi wrote: > More discussion in previous PR(https://github.com/openjdk/jdk17/pull/208) is confilict with fix for JDK-8269752. > > Opaque1 in main loop entry is expected only used in compare node's second input. However split if might clone opaque1 and replace original node with phi of opaque1 node. This causes assertion in CountedLoopNode::is_canonical_loop_entry now. > > Fix is in BoolNode::Ideal, avoiding switching compare node's input when second input is phi of opaque1. > > Test: Linux X64 tier1/2/3 release/fastdebug no regression. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk17/pull/255 From zgu at openjdk.java.net Fri Sep 10 12:26:08 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 10 Sep 2021 12:26:08 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v4] In-Reply-To: <-EYsHDdFMKiY7SVbmhnVJib76SspeVEaziPrs0OESVY=.e9b3a07e-5606-4fc6-8652-5e457476eeff@github.com> References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> <-EYsHDdFMKiY7SVbmhnVJib76SspeVEaziPrs0OESVY=.e9b3a07e-5606-4fc6-8652-5e457476eeff@github.com> Message-ID: On Fri, 10 Sep 2021 02:37:55 GMT, Eric Liu wrote: >> Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix node in place instead of creating new node > > test/hotspot/jtreg/compiler/integerArithmetic/TestNegMultiply.java line 44: > >> 42: >> 43: private static void testInt(int a, int b) { >> 44: int expected = (-a) * (-b); > > Are you sure about this is the expected value? As the method has been invoked 2000 times, I think it would be compiled by c2. The default CompileThreshold is 10K when tiered compilation is disabled, which is the case here, so there is no risk. ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From roland at openjdk.java.net Fri Sep 10 15:41:07 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 10 Sep 2021 15:41:07 GMT Subject: RFR: 8266550: C2: mirror TypeOopPtr/TypeInstPtr/TypeAryPtr with TypeKlassPtr/TypeInstKlassPtr/TypeAryKlassPtr [v8] In-Reply-To: References: Message-ID: > This is some refactoring in another attempt to fix JDK-6312651 > (Compiler should only use verified interface types for > optimization). Rather than propose the patch from: > > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-May/033803.html > > as a single big patch. I've been working on splitting it. The plan is > to have this and another refactoring patch that include no change to > the way interfaces are handled as preparation. Then only, in a third > patch, interface support along the lines of the patch I proposed 2 > years ago would be introduces. > > This patch changes the class hierarchy of types that C2 uses and > introduces TypeInstKlassPtr/TypeAryKlassPtr that mirror > TypeInstPtr/TypeAryPtr. The motivation for this is that a single: > > ciKlass* _klass; > > is no longer sufficient to describe a type (a set of interfaces must > also be carried around). That's not possible with TypeKlassPtr. > > The meet methods for TypeInstPtr/TypeInstKlassPtr and > TypeAryPtr/TypeAryKlassPtr are largely similar. I moved the most > complicated logic in helper methods: > > meet_instptr() and meet_aryptr() > > (Thanks to Vladimir I for testing the refactoring patches) Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Merge branch 'master' into JDK-8266550 - merge fix - Merge branch 'master' into JDK-8266550 - review - Merge branch 'master' into JDK-8266550 - review - Merge branch 'master' into JDK-8266550 - IR framework fix - dump tweaks - Merge branch 'master' into JDK-8266550 - ... and 5 more: https://git.openjdk.java.net/jdk/compare/461a467f...e2077d0f ------------- Changes: https://git.openjdk.java.net/jdk/pull/3880/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3880&range=07 Stats: 1216 lines in 18 files changed: 769 ins; 234 del; 213 mod Patch: https://git.openjdk.java.net/jdk/pull/3880.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3880/head:pull/3880 PR: https://git.openjdk.java.net/jdk/pull/3880 From roland at openjdk.java.net Fri Sep 10 15:41:11 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 10 Sep 2021 15:41:11 GMT Subject: RFR: 8266550: C2: mirror TypeOopPtr/TypeInstPtr/TypeAryPtr with TypeKlassPtr/TypeInstKlassPtr/TypeAryKlassPtr [v2] In-Reply-To: References: Message-ID: On Tue, 8 Jun 2021 09:59:29 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> instklassptr/aryklassptr > > Another case is in `Compile::flatten_alias_type()`: > > // Flatten all to bottom for now > switch( _AliasLevel ) { > ... > case 1: // Flatten to: oop, static, field or array > switch (tj->base()) { > case Type::RawPtr: tj = TypeRawPtr::BOTTOM; break; > case Type::AryPtr: // do not distinguish arrays at all > case Type::InstPtr: tj = TypeInstPtr::BOTTOM; break; > case Type::KlassPtr: tj = TypeInstKlassPtr::OBJECT; break; > case Type::AnyPtr: tj = TypePtr::BOTTOM; break; // caller checks it > > > `Type::KlassPtr` should be accompanied by `Type::InstKlassPtr` and `Type::AryKlassPtr` cases, shouldn't it? I had to tweak test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java after merging with master. @iwanowww @TobiHartmann are you still ok with the change? ------------- PR: https://git.openjdk.java.net/jdk/pull/3880 From sviswanathan at openjdk.java.net Fri Sep 10 15:43:51 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 10 Sep 2021 15:43:51 GMT Subject: Integrated: 8273512: Fix the copyright header of x86 macroAssembler files In-Reply-To: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: <3bfGDIHSM6TQo7-6cKKGqApPEuP-2wt5yPCvfZ-3YYU=.26868b23-b903-4b3f-b491-71764a094ffa@github.com> On Wed, 8 Sep 2021 20:09:10 GMT, Sandhya Viswanathan wrote: > Fix the copyright header of x86 macroAssembler files to match others. This pull request has now been integrated. Changeset: e58c12e6 Author: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/e58c12e61828485bfffbc9d1b865302b93a94158 Stats: 13 lines in 12 files changed: 0 ins; 0 del; 13 mod 8273512: Fix the copyright header of x86 macroAssembler files Reviewed-by: dholmes, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5424 From kvn at openjdk.java.net Fri Sep 10 17:08:50 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 10 Sep 2021 17:08:50 GMT Subject: RFR: 8273021: C2: Improve Add and Xor ideal optimizations [v5] In-Reply-To: References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: On Fri, 10 Sep 2021 07:41:36 GMT, Yi Yang wrote: >> Greetings. This patch adds the following identical equations for Add and Xor node, respectively, which probably drives further optimizations. >> >> >> ~(x-1) => -x >> ~x + 1 => -x >> >> >> >> Verified by generated opto assembly, maybe an IR verification test can be added later. >> >> Compiled method (c2) 71 1 compiler.c2.TestAddXorIdeal::test1 (6 bytes) >> 0x00007f9e11514800: sub $0x18,%rsp >> 0x00007f9e11514807: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test1 at -1 (line 39) >> 0x00007f9e1151480c: mov %esi,%eax >> 0x00007f9e1151480e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test1 at 4 (line 39) >> 0x00007f9e11514810: add $0x10,%rsp >> 0x00007f9e11514814: pop %rbp >> 0x00007f9e11514815: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151481c: ja 0x00007f9e11514823 >> 0x00007f9e11514822: retq >> >> Compiled method (c2) 73 2 compiler.c2.TestAddXorIdeal::test2 (6 bytes) >> 0x00007f9e11512480: sub $0x18,%rsp >> 0x00007f9e11512487: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test2 at -1 (line 43) >> 0x00007f9e1151248c: mov %esi,%eax >> 0x00007f9e1151248e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test2 at 4 (line 43) >> 0x00007f9e11512490: add $0x10,%rsp >> 0x00007f9e11512494: pop %rbp >> 0x00007f9e11512495: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151249c: ja 0x00007f9e115124a3 >> 0x00007f9e115124a2: retq >> >> Compiled method (c2) 72 3 compiler.c2.TestAddXorIdeal::test3 (8 bytes) >> 0x00007f9e11514b00: sub $0x18,%rsp >> 0x00007f9e11514b07: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test3 at -1 (line 47) >> 0x00007f9e11514b0c: mov %rsi,%rax >> 0x00007f9e11514b0f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test3 at 6 (line 47) >> 0x00007f9e11514b12: add $0x10,%rsp >> 0x00007f9e11514b16: pop %rbp >> 0x00007f9e11514b17: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e11514b1e: ja 0x00007f9e11514b25 >> 0x00007f9e11514b24: retq >> >> Compiled method (c2) 72 4 compiler.c2.TestAddXorIdeal::test4 (8 bytes) >> 0x00007f9e11514500: sub $0x18,%rsp >> 0x00007f9e11514507: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test4 at -1 (line 51) >> 0x00007f9e1151450c: mov %rsi,%rax >> 0x00007f9e1151450f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test4 at 6 (line 51) >> 0x00007f9e11514512: add $0x10,%rsp >> 0x00007f9e11514516: pop %rbp >> 0x00007f9e11514517: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151451e: ja 0x00007f9e11514525 >> 0x00007f9e11514524: retq >> >> Compiled method (c2) 72 5 compiler.c2.TestAddXorIdeal::test5 (6 bytes) >> 0x00007f9e11518500: sub $0x18,%rsp >> 0x00007f9e11518507: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test5 at -1 (line 55) >> 0x00007f9e1151850c: mov %esi,%eax >> 0x00007f9e1151850e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test5 at 4 (line 55) >> 0x00007f9e11518510: add $0x10,%rsp >> 0x00007f9e11518514: pop %rbp >> 0x00007f9e11518515: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151851c: ja 0x00007f9e11518523 >> 0x00007f9e11518522: retq >> >> Compiled method (c2) 74 6 compiler.c2.TestAddXorIdeal::test6 (6 bytes) >> 0x00007f9e11512180: sub $0x18,%rsp >> 0x00007f9e11512187: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test6 at -1 (line 59) >> 0x00007f9e1151218c: mov %esi,%eax >> 0x00007f9e1151218e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test6 at 4 (line 59) >> 0x00007f9e11512190: add $0x10,%rsp >> 0x00007f9e11512194: pop %rbp >> 0x00007f9e11512195: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151219c: ja 0x00007f9e115121a3 >> 0x00007f9e115121a2: retq >> >> Compiled method (c2) 74 7 compiler.c2.TestAddXorIdeal::test7 (8 bytes) >> 0x00007f9e11511e80: sub $0x18,%rsp >> 0x00007f9e11511e87: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test7 at -1 (line 63) >> 0x00007f9e11511e8c: mov %rsi,%rax >> 0x00007f9e11511e8f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test7 at 6 (line 63) >> 0x00007f9e11511e92: add $0x10,%rsp >> 0x00007f9e11511e96: pop %rbp >> 0x00007f9e11511e97: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e11511e9e: ja 0x00007f9e11511ea5 >> 0x00007f9e11511ea4: retq >> >> >> Compiled method (c2) 73 8 compiler.c2.TestAddXorIdeal::test8 (10 bytes) >> 0x00007f9e11512780: sub $0x18,%rsp >> 0x00007f9e11512787: mov %rbp,0x10(%rsp) ;*synchronization entry >> ; - compiler.c2.TestAddXorIdeal::test8 at -1 (line 67) >> 0x00007f9e1151278c: mov %rsi,%rax >> 0x00007f9e1151278f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} >> ; - compiler.c2.TestAddXorIdeal::test8 at 8 (line 67) >> 0x00007f9e11512792: add $0x10,%rsp >> 0x00007f9e11512796: pop %rbp >> 0x00007f9e11512797: cmp 0x338(%r15),%rsp ; {poll_return} >> 0x00007f9e1151279e: ja 0x00007f9e115127a5 >> 0x00007f9e115127a4: retq > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > change Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5266 From eliu at openjdk.java.net Sat Sep 11 00:31:49 2021 From: eliu at openjdk.java.net (Eric Liu) Date: Sat, 11 Sep 2021 00:31:49 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v4] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Thu, 9 Sep 2021 15:02:32 GMT, Zhengyu Gu wrote: >> The transformation reduce instructions in generated code. >> >> ### x86_64: >> >> Before: >> ``` >> 0x00007fb92c78b3ac: neg %esi >> 0x00007fb92c78b3ae: neg %edx >> 0x00007fb92c78b3b0: mov %esi,%eax >> 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> After: >> >> ; - TestSub::runSub at -1 (line 9) >> 0x00007fc8c05b74ac: mov %esi,%eax >> 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> >> ### AArch64: >> Before: >> >> 0x0000ffff814b4a70: neg w11, w1 >> 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> After: >> >> 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Fix node in place instead of creating new node LGTM ------------- Marked as reviewed by eliu (Author). PR: https://git.openjdk.java.net/jdk/pull/5403 From jiefu at openjdk.java.net Sat Sep 11 06:25:02 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 11 Sep 2021 06:25:02 GMT Subject: RFR: 8273629: compiler/uncommontrap/TestDeoptOOM.java fails with release VMs Message-ID: Hi all, The test fails since VM option 'LogCompilation' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions. So only `-XX:+UnlockDiagnosticVMOptions` is added to fix it. Thanks. Best regards, Jie ------------- Commit messages: - 8273629: compiler/uncommontrap/TestDeoptOOM.java fails with release VMs Changes: https://git.openjdk.java.net/jdk/pull/5479/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5479&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273629 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5479.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5479/head:pull/5479 PR: https://git.openjdk.java.net/jdk/pull/5479 From yyang at openjdk.java.net Mon Sep 13 02:13:56 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Mon, 13 Sep 2021 02:13:56 GMT Subject: Integrated: 8273021: C2: Improve Add and Xor ideal optimizations In-Reply-To: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> References: <6ETrh2hQ4xodDeDjGT3NCjENHbOj0fp2EJsfCDAVINE=.94f15263-2919-4762-a8a9-1d6339f7be96@github.com> Message-ID: On Thu, 26 Aug 2021 09:19:41 GMT, Yi Yang wrote: > Greetings. This patch adds the following identical equations for Add and Xor node, respectively, which probably drives further optimizations. > > > ~(x-1) => -x > ~x + 1 => -x > > > > Verified by generated opto assembly, maybe an IR verification test can be added later. > > Compiled method (c2) 71 1 compiler.c2.TestAddXorIdeal::test1 (6 bytes) > 0x00007f9e11514800: sub $0x18,%rsp > 0x00007f9e11514807: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test1 at -1 (line 39) > 0x00007f9e1151480c: mov %esi,%eax > 0x00007f9e1151480e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test1 at 4 (line 39) > 0x00007f9e11514810: add $0x10,%rsp > 0x00007f9e11514814: pop %rbp > 0x00007f9e11514815: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151481c: ja 0x00007f9e11514823 > 0x00007f9e11514822: retq > > Compiled method (c2) 73 2 compiler.c2.TestAddXorIdeal::test2 (6 bytes) > 0x00007f9e11512480: sub $0x18,%rsp > 0x00007f9e11512487: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test2 at -1 (line 43) > 0x00007f9e1151248c: mov %esi,%eax > 0x00007f9e1151248e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test2 at 4 (line 43) > 0x00007f9e11512490: add $0x10,%rsp > 0x00007f9e11512494: pop %rbp > 0x00007f9e11512495: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151249c: ja 0x00007f9e115124a3 > 0x00007f9e115124a2: retq > > Compiled method (c2) 72 3 compiler.c2.TestAddXorIdeal::test3 (8 bytes) > 0x00007f9e11514b00: sub $0x18,%rsp > 0x00007f9e11514b07: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test3 at -1 (line 47) > 0x00007f9e11514b0c: mov %rsi,%rax > 0x00007f9e11514b0f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test3 at 6 (line 47) > 0x00007f9e11514b12: add $0x10,%rsp > 0x00007f9e11514b16: pop %rbp > 0x00007f9e11514b17: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e11514b1e: ja 0x00007f9e11514b25 > 0x00007f9e11514b24: retq > > Compiled method (c2) 72 4 compiler.c2.TestAddXorIdeal::test4 (8 bytes) > 0x00007f9e11514500: sub $0x18,%rsp > 0x00007f9e11514507: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test4 at -1 (line 51) > 0x00007f9e1151450c: mov %rsi,%rax > 0x00007f9e1151450f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test4 at 6 (line 51) > 0x00007f9e11514512: add $0x10,%rsp > 0x00007f9e11514516: pop %rbp > 0x00007f9e11514517: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151451e: ja 0x00007f9e11514525 > 0x00007f9e11514524: retq > > Compiled method (c2) 72 5 compiler.c2.TestAddXorIdeal::test5 (6 bytes) > 0x00007f9e11518500: sub $0x18,%rsp > 0x00007f9e11518507: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test5 at -1 (line 55) > 0x00007f9e1151850c: mov %esi,%eax > 0x00007f9e1151850e: neg %eax ;*iadd {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test5 at 4 (line 55) > 0x00007f9e11518510: add $0x10,%rsp > 0x00007f9e11518514: pop %rbp > 0x00007f9e11518515: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151851c: ja 0x00007f9e11518523 > 0x00007f9e11518522: retq > > Compiled method (c2) 74 6 compiler.c2.TestAddXorIdeal::test6 (6 bytes) > 0x00007f9e11512180: sub $0x18,%rsp > 0x00007f9e11512187: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test6 at -1 (line 59) > 0x00007f9e1151218c: mov %esi,%eax > 0x00007f9e1151218e: neg %eax ;*ixor {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test6 at 4 (line 59) > 0x00007f9e11512190: add $0x10,%rsp > 0x00007f9e11512194: pop %rbp > 0x00007f9e11512195: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151219c: ja 0x00007f9e115121a3 > 0x00007f9e115121a2: retq > > Compiled method (c2) 74 7 compiler.c2.TestAddXorIdeal::test7 (8 bytes) > 0x00007f9e11511e80: sub $0x18,%rsp > 0x00007f9e11511e87: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test7 at -1 (line 63) > 0x00007f9e11511e8c: mov %rsi,%rax > 0x00007f9e11511e8f: neg %rax ;*ladd {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test7 at 6 (line 63) > 0x00007f9e11511e92: add $0x10,%rsp > 0x00007f9e11511e96: pop %rbp > 0x00007f9e11511e97: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e11511e9e: ja 0x00007f9e11511ea5 > 0x00007f9e11511ea4: retq > > > Compiled method (c2) 73 8 compiler.c2.TestAddXorIdeal::test8 (10 bytes) > 0x00007f9e11512780: sub $0x18,%rsp > 0x00007f9e11512787: mov %rbp,0x10(%rsp) ;*synchronization entry > ; - compiler.c2.TestAddXorIdeal::test8 at -1 (line 67) > 0x00007f9e1151278c: mov %rsi,%rax > 0x00007f9e1151278f: neg %rax ;*lxor {reexecute=0 rethrow=0 return_oop=0} > ; - compiler.c2.TestAddXorIdeal::test8 at 8 (line 67) > 0x00007f9e11512792: add $0x10,%rsp > 0x00007f9e11512796: pop %rbp > 0x00007f9e11512797: cmp 0x338(%r15),%rsp ; {poll_return} > 0x00007f9e1151279e: ja 0x00007f9e115127a5 > 0x00007f9e115127a4: retq This pull request has now been integrated. Changeset: a73c06de Author: Yi Yang URL: https://git.openjdk.java.net/jdk/commit/a73c06de2ac47033503189140c0f8ee61fcbceae Stats: 137 lines in 3 files changed: 136 ins; 0 del; 1 mod 8273021: C2: Improve Add and Xor ideal optimizations Co-authored-by: yulei Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5266 From volker.simonis at gmail.com Mon Sep 13 08:08:40 2021 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 13 Sep 2021 10:08:40 +0200 Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: Hi, may I kindly ask somebody to please take a look at this PR? Thank you and best regards, Volker On Tue, Sep 7, 2021 at 5:42 PM Volker Simonis wrote: > > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > For the attached JTreg test, we get the following exception in interpreter mode: > > java.lang.NullPointerException: Cannot read the array length because "" is null > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > java.lang.NullPointerException > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > ## Implementation details > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > ------------- > > Commit messages: > - 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow > > Changes: https://git.openjdk.java.net/jdk/pull/5392/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5392&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8273392 > Stats: 538 lines in 12 files changed: 417 ins; 6 del; 115 mod > Patch: https://git.openjdk.java.net/jdk/pull/5392.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/5392/head:pull/5392 > > PR: https://git.openjdk.java.net/jdk/pull/5392 From simonis at openjdk.java.net Mon Sep 13 10:19:49 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 13 Sep 2021 10:19:49 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow Message-ID: Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): public static boolean isAlpha(int c) { try { return IS_ALPHA[c]; } catch (ArrayIndexOutOfBoundsException ex) { return false; } } ### Solution Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions Benchmark (exceptionProbability) Mode Cnt Score Error Units ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions Benchmark (exceptionProbability) Mode Cnt Score Error Units ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op ### Implementation details - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. ------------- Commit messages: - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow Changes: https://git.openjdk.java.net/jdk/pull/5488/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273563 Stats: 198 lines in 10 files changed: 190 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From shade at openjdk.java.net Mon Sep 13 10:33:49 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 13 Sep 2021 10:33:49 GMT Subject: RFR: 8273629: compiler/uncommontrap/TestDeoptOOM.java fails with release VMs In-Reply-To: References: Message-ID: On Sat, 11 Sep 2021 06:19:16 GMT, Jie Fu wrote: > Hi all, > > The test fails since VM option 'LogCompilation' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions. > So only `-XX:+UnlockDiagnosticVMOptions` is added to fix it. > > Thanks. > Best regards, > Jie This looks fine to me. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5479 From roland at openjdk.java.net Mon Sep 13 11:41:06 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 13 Sep 2021 11:41:06 GMT Subject: RFR: 8273659: Replay compilation crashes with SIGSEGV Message-ID: <8vY79CHml4ijEgzlBJtMZqRrl4YKc60-cr4LGX4vX8A=.a2d2ff56-12a6-4afe-9d5f-b5ff08a18a64@github.com> I noticed replay fails when it encounters classes that it can't resolve (when running with -XX:+ReplayIgnoreInitErrors). ------------- Commit messages: - replay fix Changes: https://git.openjdk.java.net/jdk/pull/5490/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5490&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273659 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5490.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5490/head:pull/5490 PR: https://git.openjdk.java.net/jdk/pull/5490 From duke at openjdk.java.net Mon Sep 13 11:52:58 2021 From: duke at openjdk.java.net (duke) Date: Mon, 13 Sep 2021 11:52:58 GMT Subject: Withdrawn: 8267376: C2: Deduce the result bound of ModXNode In-Reply-To: <0qR8Ju3ahlklYs-8x4bBtwHS9uFoNHcLnwILNMML9Ig=.c8c4ade9-59b8-411d-80e9-0d55d2726551@github.com> References: <0qR8Ju3ahlklYs-8x4bBtwHS9uFoNHcLnwILNMML9Ig=.c8c4ade9-59b8-411d-80e9-0d55d2726551@github.com> Message-ID: On Tue, 25 May 2021 03:16:37 GMT, Yi Yang wrote: > if the divisor is a constant and not equal to 0, it's possible to deduce the final bound of ModXNode given that the following rules: > > x % -y ==> [0, y - 1] > x % y ==> [0, y - 1] > -x % y ==> [-y + 1, 0] > -x % -y ==> [-y + 1, 0] > > > FYI: The original purpose of this patch is to eliminate array access range check(e.g. `arr[val%5]`) which discussed in https://github.com/openjdk/jdk/pull/4083#issuecomment-846971247, because ModXNode would be transformed to other nodes during IGVN, RangeCheckNode is still generated when accessing array element. Regardless of eliminating array access range check, it's still reasonable to deduce the bound of % operation if the divisor is known constant. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/4179 From jiefu at openjdk.java.net Mon Sep 13 12:15:53 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 13 Sep 2021 12:15:53 GMT Subject: RFR: 8273629: compiler/uncommontrap/TestDeoptOOM.java fails with release VMs In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 10:30:35 GMT, Aleksey Shipilev wrote: >> Hi all, >> >> The test fails since VM option 'LogCompilation' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions. >> So only `-XX:+UnlockDiagnosticVMOptions` is added to fix it. >> >> Thanks. >> Best regards, >> Jie > > This looks fine to me. Thanks @shipilev . ------------- PR: https://git.openjdk.java.net/jdk/pull/5479 From jiefu at openjdk.java.net Mon Sep 13 12:15:54 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 13 Sep 2021 12:15:54 GMT Subject: Integrated: 8273629: compiler/uncommontrap/TestDeoptOOM.java fails with release VMs In-Reply-To: References: Message-ID: On Sat, 11 Sep 2021 06:19:16 GMT, Jie Fu wrote: > Hi all, > > The test fails since VM option 'LogCompilation' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions. > So only `-XX:+UnlockDiagnosticVMOptions` is added to fix it. > > Thanks. > Best regards, > Jie This pull request has now been integrated. Changeset: 261cb44b Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/261cb44b13e5910180a2599ca756eb7ae6f9c443 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8273629: compiler/uncommontrap/TestDeoptOOM.java fails with release VMs Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/5479 From vlivanov at openjdk.java.net Mon Sep 13 13:01:03 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 13 Sep 2021 13:01:03 GMT Subject: RFR: 8266550: C2: mirror TypeOopPtr/TypeInstPtr/TypeAryPtr with TypeKlassPtr/TypeInstKlassPtr/TypeAryKlassPtr [v8] In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 15:41:07 GMT, Roland Westrelin wrote: >> This is some refactoring in another attempt to fix JDK-6312651 >> (Compiler should only use verified interface types for >> optimization). Rather than propose the patch from: >> >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-May/033803.html >> >> as a single big patch. I've been working on splitting it. The plan is >> to have this and another refactoring patch that include no change to >> the way interfaces are handled as preparation. Then only, in a third >> patch, interface support along the lines of the patch I proposed 2 >> years ago would be introduces. >> >> This patch changes the class hierarchy of types that C2 uses and >> introduces TypeInstKlassPtr/TypeAryKlassPtr that mirror >> TypeInstPtr/TypeAryPtr. The motivation for this is that a single: >> >> ciKlass* _klass; >> >> is no longer sufficient to describe a type (a set of interfaces must >> also be carried around). That's not possible with TypeKlassPtr. >> >> The meet methods for TypeInstPtr/TypeInstKlassPtr and >> TypeAryPtr/TypeAryKlassPtr are largely similar. I moved the most >> complicated logic in helper methods: >> >> meet_instptr() and meet_aryptr() >> >> (Thanks to Vladimir I for testing the refactoring patches) > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into JDK-8266550 > - merge fix > - Merge branch 'master' into JDK-8266550 > - review > - Merge branch 'master' into JDK-8266550 > - review > - Merge branch 'master' into JDK-8266550 > - IR framework fix > - dump tweaks > - Merge branch 'master' into JDK-8266550 > - ... and 5 more: https://git.openjdk.java.net/jdk/compare/461a467f...e2077d0f Still looks good to me. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3880 From kvn at openjdk.java.net Mon Sep 13 17:07:45 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 13 Sep 2021 17:07:45 GMT Subject: RFR: 8273659: Replay compilation crashes with SIGSEGV In-Reply-To: <8vY79CHml4ijEgzlBJtMZqRrl4YKc60-cr4LGX4vX8A=.a2d2ff56-12a6-4afe-9d5f-b5ff08a18a64@github.com> References: <8vY79CHml4ijEgzlBJtMZqRrl4YKc60-cr4LGX4vX8A=.a2d2ff56-12a6-4afe-9d5f-b5ff08a18a64@github.com> Message-ID: On Mon, 13 Sep 2021 11:31:24 GMT, Roland Westrelin wrote: > I noticed replay fails when it encounters classes that it can't > resolve (when running with -XX:+ReplayIgnoreInitErrors). Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5490 From zgu at openjdk.java.net Mon Sep 13 19:10:52 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 13 Sep 2021 19:10:52 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v4] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Sat, 11 Sep 2021 00:28:50 GMT, Eric Liu wrote: > LGTM Thanks, @theRealELiu ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From dlong at openjdk.java.net Mon Sep 13 22:11:10 2021 From: dlong at openjdk.java.net (Dean Long) Date: Mon, 13 Sep 2021 22:11:10 GMT Subject: RFR: 8273659: Replay compilation crashes with SIGSEGV In-Reply-To: <8vY79CHml4ijEgzlBJtMZqRrl4YKc60-cr4LGX4vX8A=.a2d2ff56-12a6-4afe-9d5f-b5ff08a18a64@github.com> References: <8vY79CHml4ijEgzlBJtMZqRrl4YKc60-cr4LGX4vX8A=.a2d2ff56-12a6-4afe-9d5f-b5ff08a18a64@github.com> Message-ID: On Mon, 13 Sep 2021 11:31:24 GMT, Roland Westrelin wrote: > I noticed replay fails when it encounters classes that it can't > resolve (when running with -XX:+ReplayIgnoreInitErrors). Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5490 From dlong at openjdk.java.net Mon Sep 13 22:14:13 2021 From: dlong at openjdk.java.net (Dean Long) Date: Mon, 13 Sep 2021 22:14:13 GMT Subject: RFR: 8273659: Replay compilation crashes with SIGSEGV In-Reply-To: <8vY79CHml4ijEgzlBJtMZqRrl4YKc60-cr4LGX4vX8A=.a2d2ff56-12a6-4afe-9d5f-b5ff08a18a64@github.com> References: <8vY79CHml4ijEgzlBJtMZqRrl4YKc60-cr4LGX4vX8A=.a2d2ff56-12a6-4afe-9d5f-b5ff08a18a64@github.com> Message-ID: On Mon, 13 Sep 2021 11:31:24 GMT, Roland Westrelin wrote: > I noticed replay fails when it encounters classes that it can't > resolve (when running with -XX:+ReplayIgnoreInitErrors). src/hotspot/share/ci/ciReplay.cpp line 445: > 443: if (k == NULL) { > 444: return NULL; > 445: } Do we need a call to report_error() for a descriptive message for these failure paths? ------------- PR: https://git.openjdk.java.net/jdk/pull/5490 From thartmann at openjdk.java.net Tue Sep 14 06:04:19 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 14 Sep 2021 06:04:19 GMT Subject: RFR: 8266550: C2: mirror TypeOopPtr/TypeInstPtr/TypeAryPtr with TypeKlassPtr/TypeInstKlassPtr/TypeAryKlassPtr [v8] In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 15:41:07 GMT, Roland Westrelin wrote: >> This is some refactoring in another attempt to fix JDK-6312651 >> (Compiler should only use verified interface types for >> optimization). Rather than propose the patch from: >> >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-May/033803.html >> >> as a single big patch. I've been working on splitting it. The plan is >> to have this and another refactoring patch that include no change to >> the way interfaces are handled as preparation. Then only, in a third >> patch, interface support along the lines of the patch I proposed 2 >> years ago would be introduces. >> >> This patch changes the class hierarchy of types that C2 uses and >> introduces TypeInstKlassPtr/TypeAryKlassPtr that mirror >> TypeInstPtr/TypeAryPtr. The motivation for this is that a single: >> >> ciKlass* _klass; >> >> is no longer sufficient to describe a type (a set of interfaces must >> also be carried around). That's not possible with TypeKlassPtr. >> >> The meet methods for TypeInstPtr/TypeInstKlassPtr and >> TypeAryPtr/TypeAryKlassPtr are largely similar. I moved the most >> complicated logic in helper methods: >> >> meet_instptr() and meet_aryptr() >> >> (Thanks to Vladimir I for testing the refactoring patches) > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into JDK-8266550 > - merge fix > - Merge branch 'master' into JDK-8266550 > - review > - Merge branch 'master' into JDK-8266550 > - review > - Merge branch 'master' into JDK-8266550 > - IR framework fix > - dump tweaks > - Merge branch 'master' into JDK-8266550 > - ... and 5 more: https://git.openjdk.java.net/jdk/compare/461a467f...e2077d0f Still looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3880 From thartmann at openjdk.java.net Tue Sep 14 06:51:08 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 14 Sep 2021 06:51:08 GMT Subject: RFR: JDK-8272574: C2: assert(false) failed: Bad graph detected in build_loop_late [v11] In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 04:16:03 GMT, ?? wrote: >> Current loop predication will promote nodes(with a dependency on a later control node) to the insertion point which dominates the later control node. >> >> In the following example, loopPrediction will promote node 434 to the outer loop(predicted insert point is right after node 424), and it depends on control node 207. But node 424 dominates node 207, which means after the promotion, the cloned nodes have a control dependency on a later control node, which leads to a bad graph. >> >> ![image](https://user-images.githubusercontent.com/25214855/129720970-ff65b8f4-8bef-401d-8590-54aca6de470e.png) >> >> ![image](https://user-images.githubusercontent.com/25214855/129721369-4c61222b-7305-4522-9a37-e3e6e2138aa9.png) > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Add more debug infomation Looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5142 From thartmann at openjdk.java.net Tue Sep 14 07:09:06 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 14 Sep 2021 07:09:06 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v4] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Thu, 9 Sep 2021 15:02:32 GMT, Zhengyu Gu wrote: >> The transformation reduce instructions in generated code. >> >> ### x86_64: >> >> Before: >> ``` >> 0x00007fb92c78b3ac: neg %esi >> 0x00007fb92c78b3ae: neg %edx >> 0x00007fb92c78b3b0: mov %esi,%eax >> 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> After: >> >> ; - TestSub::runSub at -1 (line 9) >> 0x00007fc8c05b74ac: mov %esi,%eax >> 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> >> ### AArch64: >> Before: >> >> 0x0000ffff814b4a70: neg w11, w1 >> 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> After: >> >> 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Fix node in place instead of creating new node Changes requested by thartmann (Reviewer). src/hotspot/share/opto/mulnode.cpp line 70: > 68: Node *in2 = in(2); > 69: if (in1->is_Sub() && in2->is_Sub()) { > 70: Node* n11 = in1->in(1); For consistency with below code, I would name the local `in11` or simply use `phase->type(in1->in(1))` because it's the only user. test/hotspot/jtreg/compiler/integerArithmetic/TestNegMultiply.java line 26: > 24: /** > 25: * @test > 26: * @bug 8273454 The test needs `* @key randomness` test/hotspot/jtreg/compiler/integerArithmetic/TestNegMultiply.java line 36: > 34: > 35: public class TestNegMultiply { > 36: private static Random random = new Random(); You should use `Utils.getRandomInstance()` from `jdk.test.lib.Utils` to ensure that the seed is printed for reproducibility. You can check other tests for an example. test/hotspot/jtreg/compiler/integerArithmetic/TestNegMultiply.java line 45: > 43: private static void testInt(int a, int b) { > 44: int expected = (-a) * (-b); > 45: for (int i = 0; i < 20_000; i++) { Why do you need a second loop in here? It's sufficient to set `TEST_COUNT` high enough to trigger compilation. I would suggest something like this: private static int testInt(int a, int b) { return (-a) * (-b); } private static void runIntTests() { for (int i = 0; i < TEST_COUNT; i++) { int a = random.nextInt(); int b = random.nextInt(); int res = testInt(a, b); Asserts.assertEQ(a * b, res); } } And then run with `-XX:CompileCommand=dontinline,TestNegMultiply::test*`. No need to disable OnStackReplacement. ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From thartmann at openjdk.java.net Tue Sep 14 07:09:06 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 14 Sep 2021 07:09:06 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v4] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> <-EYsHDdFMKiY7SVbmhnVJib76SspeVEaziPrs0OESVY=.e9b3a07e-5606-4fc6-8652-5e457476eeff@github.com> Message-ID: On Fri, 10 Sep 2021 12:23:11 GMT, Zhengyu Gu wrote: >> test/hotspot/jtreg/compiler/integerArithmetic/TestNegMultiply.java line 44: >> >>> 42: >>> 43: private static void testInt(int a, int b) { >>> 44: int expected = (-a) * (-b); >> >> Are you sure about this is the expected value? As the method has been invoked 2000 times, I think it would be compiled by c2. > > The default CompileThreshold is 10K when tiered compilation is disabled, which is the case here, so there is no risk. But why don't you compute `expected` as `a * b`? ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From forax at univ-mlv.fr Tue Sep 14 07:11:37 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 14 Sep 2021 09:11:37 +0200 (CEST) Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: <1955977649.703493.1631603497495.JavaMail.zimbra@u-pem.fr> (not a reviewer so this message will not be really helpful ...) Hi Volker, for me it's not an enhancement, but a bug fix, in production an exception with no stacktrace is useless and result in hours lost trying to figure out the issue (see by example [1] on stackoverflow). This is not a new issue, this bug pop time to time since OmitStackTraceInFastThrow was introduced (in 1.4.x, i believe). Thanks to taking the time to fix that. regards, R?mi [1] https://stackoverflow.com/questions/2411487/nullpointerexception-in-java-with-no-stacktrace ----- Original Message ----- > From: "Volker Simonis" > To: "Volker Simonis" > Cc: "hotspot-dev" , "hotspot compiler" > Sent: Lundi 13 Septembre 2021 10:08:40 > Subject: Re: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow > Hi, > > may I kindly ask somebody to please take a look at this PR? > > Thank you and best regards, > Volker > > On Tue, Sep 7, 2021 at 5:42 PM Volker Simonis wrote: >> >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will >> optimize certain "hot" implicit exceptions (i.e. AIOOBE, >> NullPointerExceptions,..) and replace them by a static, pre-allocated exception >> without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated >> exception object for all methods we can let the compiler allocate specific >> exceptions for each compilation unit (i.e. nmethod) and fill them with at least >> one stack frame with the method /line-number information of the currently >> compiled method. If the method in question is being inlined (which often >> happens), we can add stackframes for all callers up to the inlining depth of >> the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because >> "" is null >> at >> compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at >> compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at >> compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at >> compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same >> exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be >> compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at >> compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into >> `level2()` and `level2()` into `level1()` we will get the following exception >> (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at >> compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at >> compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at >> compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched >> on by default (I'll create a CSR for the new option once reviewers are >> comfortable with the change). Notice that the optimization comes at no run-time >> costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` >> potentially lazy-allocates the empty singleton exceptions like AIOOBE in >> `ciEnv::ArrayStoreException_instance()`. With this change, if running with >> `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and >> populate them with the stack frames which are statically available at compile >> time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI >> handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles >> once a nmethod gets unloaded. In order to achieve this, I've added a new >> section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats >> (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized >> that a previous change ([JDK-8254231: Implementation of Foreign Linker API >> (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already >> introduced a new nmethod section ("native invokers") but missed to add it to >> the corresponding stats and printing routines so I've added that section as >> well. >> - The `#ifdef COMPILER2` guards are only required to not break the >> `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit >> exceptions as "hot". This makes the test simpler and at the same time provokes >> the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the >> corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of >> methods with explicit exceptions to provoke GCs during compilation didn't >> reveal any issues. >> >> ------------- >> >> Commit messages: >> - 8273392: Improve usability of stack-less exceptions due to >> -XX:+OmitStackTraceInFastThrow >> >> Changes: https://git.openjdk.java.net/jdk/pull/5392/files >> Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5392&range=00 >> Issue: https://bugs.openjdk.java.net/browse/JDK-8273392 >> Stats: 538 lines in 12 files changed: 417 ins; 6 del; 115 mod >> Patch: https://git.openjdk.java.net/jdk/pull/5392.diff >> Fetch: git fetch https://git.openjdk.java.net/jdk pull/5392/head:pull/5392 >> > > PR: https://git.openjdk.java.net/jdk/pull/5392 From github.com+25214855+casparcwang at openjdk.java.net Tue Sep 14 07:19:06 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Tue, 14 Sep 2021 07:19:06 GMT Subject: RFR: JDK-8272574: C2: assert(false) failed: Bad graph detected in build_loop_late [v11] In-Reply-To: References: Message-ID: On Tue, 14 Sep 2021 06:48:19 GMT, Tobias Hartmann wrote: > Looks reasonable to me. Thank you for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5142 From roland at openjdk.java.net Tue Sep 14 08:38:23 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 14 Sep 2021 08:38:23 GMT Subject: RFR: 8266550: C2: mirror TypeOopPtr/TypeInstPtr/TypeAryPtr with TypeKlassPtr/TypeInstKlassPtr/TypeAryKlassPtr [v8] In-Reply-To: References: Message-ID: On Tue, 14 Sep 2021 06:01:13 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge branch 'master' into JDK-8266550 >> - merge fix >> - Merge branch 'master' into JDK-8266550 >> - review >> - Merge branch 'master' into JDK-8266550 >> - review >> - Merge branch 'master' into JDK-8266550 >> - IR framework fix >> - dump tweaks >> - Merge branch 'master' into JDK-8266550 >> - ... and 5 more: https://git.openjdk.java.net/jdk/compare/461a467f...e2077d0f > > Still looks good to me. @TobiHartmann @iwanowww thanks for the re-review! ------------- PR: https://git.openjdk.java.net/jdk/pull/3880 From roland at openjdk.java.net Tue Sep 14 08:38:24 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 14 Sep 2021 08:38:24 GMT Subject: Integrated: 8266550: C2: mirror TypeOopPtr/TypeInstPtr/TypeAryPtr with TypeKlassPtr/TypeInstKlassPtr/TypeAryKlassPtr In-Reply-To: References: Message-ID: On Wed, 5 May 2021 11:29:44 GMT, Roland Westrelin wrote: > This is some refactoring in another attempt to fix JDK-6312651 > (Compiler should only use verified interface types for > optimization). Rather than propose the patch from: > > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-May/033803.html > > as a single big patch. I've been working on splitting it. The plan is > to have this and another refactoring patch that include no change to > the way interfaces are handled as preparation. Then only, in a third > patch, interface support along the lines of the patch I proposed 2 > years ago would be introduces. > > This patch changes the class hierarchy of types that C2 uses and > introduces TypeInstKlassPtr/TypeAryKlassPtr that mirror > TypeInstPtr/TypeAryPtr. The motivation for this is that a single: > > ciKlass* _klass; > > is no longer sufficient to describe a type (a set of interfaces must > also be carried around). That's not possible with TypeKlassPtr. > > The meet methods for TypeInstPtr/TypeInstKlassPtr and > TypeAryPtr/TypeAryKlassPtr are largely similar. I moved the most > complicated logic in helper methods: > > meet_instptr() and meet_aryptr() > > (Thanks to Vladimir I for testing the refactoring patches) This pull request has now been integrated. Changeset: 1d2458db Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/1d2458db34ed6acdd20bb8c165b7619cdbc32f47 Stats: 1216 lines in 18 files changed: 769 ins; 234 del; 213 mod 8266550: C2: mirror TypeOopPtr/TypeInstPtr/TypeAryPtr with TypeKlassPtr/TypeInstKlassPtr/TypeAryKlassPtr Reviewed-by: vlivanov, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/3880 From github.com+25214855+casparcwang at openjdk.java.net Tue Sep 14 20:59:30 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Tue, 14 Sep 2021 20:59:30 GMT Subject: Integrated: JDK-8272574: C2: assert(false) failed: Bad graph detected in build_loop_late In-Reply-To: References: Message-ID: On Tue, 17 Aug 2021 12:02:21 GMT, ?? wrote: > Current loop predication will promote nodes(with a dependency on a later control node) to the insertion point which dominates the later control node. > > In the following example, loopPrediction will promote node 434 to the outer loop(predicted insert point is right after node 424), and it depends on control node 207. But node 424 dominates node 207, which means after the promotion, the cloned nodes have a control dependency on a later control node, which leads to a bad graph. > > ![image](https://user-images.githubusercontent.com/25214855/129720970-ff65b8f4-8bef-401d-8590-54aca6de470e.png) > > ![image](https://user-images.githubusercontent.com/25214855/129721369-4c61222b-7305-4522-9a37-e3e6e2138aa9.png) This pull request has now been integrated. Changeset: 16c3ad1f Author: casparcwang Committer: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/16c3ad1ff4d9a0e21f15656c73a96a6c143c811a Stats: 96 lines in 4 files changed: 91 ins; 0 del; 5 mod 8272574: C2: assert(false) failed: Bad graph detected in build_loop_late Co-authored-by: Hui Shi Co-authored-by: Christian Hagedorn Reviewed-by: thartmann, chagedorn, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5142 From roland at openjdk.java.net Wed Sep 15 09:08:45 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Wed, 15 Sep 2021 09:08:45 GMT Subject: RFR: 8273659: Replay compilation crashes with SIGSEGV since 8271911 In-Reply-To: References: <8vY79CHml4ijEgzlBJtMZqRrl4YKc60-cr4LGX4vX8A=.a2d2ff56-12a6-4afe-9d5f-b5ff08a18a64@github.com> Message-ID: On Mon, 13 Sep 2021 22:11:16 GMT, Dean Long wrote: >> I noticed replay fails when it encounters classes that it can't >> resolve (when running with -XX:+ReplayIgnoreInitErrors). > > src/hotspot/share/ci/ciReplay.cpp line 445: > >> 443: if (k == NULL) { >> 444: return NULL; >> 445: } > > Do we need a call to report_error() for a descriptive message for these failure paths? parse_method() has calls to report_error() when it returns NULL. I think it's true of parse_klass() except if SymbolTable::new_symbol() returns NULL (but can it return NULL?). Do you think extra messages make sense then? ------------- PR: https://git.openjdk.java.net/jdk/pull/5490 From shade at openjdk.java.net Wed Sep 15 11:08:25 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 15 Sep 2021 11:08:25 GMT Subject: RFR: 8273806: compiler/cpuflags/TestSSE4Disabled.java should test for CPU feature explicitly Message-ID: <3XhZAKna3x7cjYwB64G8tUj6_Yzj7HoBvfrAfgEzZWk=.b0f85e73-031a-44f1-9ffe-1c442f600aef@github.com> JDK-8158214 added a test that verifies that machines with SSE4 support do not crash when lower SSE level is required. But it tests for CPU capabilities weirdly. This _tangentially_ manifests when running the test with Zero: $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=compiler/cpuflags/TestSSE4Disabled.java ... STDERR: Unrecognized VM option 'UseSSE=3' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. I think we can test that target CPU supports SSE4, and only run the test then. It would implicitly fix Zero test failure too, as Zero impersonates a "generic" featureless CPU. Plus, it would stop running the -Xcomp test on arches that do not actually need this test to run. Additional testing: - [x] Linux x86_64 Server, TR 3970X, affected test still runs - [x] Linux x86_64 Zero, TR 3970X, affected test is now skipped ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/5530/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5530&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273806 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5530.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5530/head:pull/5530 PR: https://git.openjdk.java.net/jdk/pull/5530 From shade at openjdk.java.net Wed Sep 15 11:17:02 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 15 Sep 2021 11:17:02 GMT Subject: RFR: 8273807: Zero: Drop incorrect test block from compiler/startup/NumCompilerThreadsCheck.java Message-ID: There is a Zero-specific test block in compiler/startup/NumCompilerThreadsCheck.java, which currently fails: $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=compiler/startup/NumCompilerThreadsCheck.java ... STDERR: stdout: []; stderr: [intx CICompilerCount=-1 is outside the allowed range [ 0 ... 2147483647 ] Improperly specified VM option 'CICompilerCount=-1' Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. ] exitValue = 1 java.lang.RuntimeException: 'must be at least 0' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:221) at compiler.startup.NumCompilerThreadsCheck.main(NumCompilerThreadsCheck.java:53) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) at java.base/java.lang.Thread.run(Thread.java:833) I believe it was rendered obsolete by JDK-8122937. Additional testing: - [x] Linux x86_64 Zero, affected test now passes ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/5531/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5531&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273807 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5531.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5531/head:pull/5531 PR: https://git.openjdk.java.net/jdk/pull/5531 From zgu at openjdk.java.net Wed Sep 15 15:08:29 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 15 Sep 2021 15:08:29 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v5] In-Reply-To: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: > The transformation reduce instructions in generated code. > > ### x86_64: > > Before: > ``` > 0x00007fb92c78b3ac: neg %esi > 0x00007fb92c78b3ae: neg %edx > 0x00007fb92c78b3b0: mov %esi,%eax > 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > After: > > ; - TestSub::runSub at -1 (line 9) > 0x00007fc8c05b74ac: mov %esi,%eax > 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > > > ### AArch64: > Before: > > 0x0000ffff814b4a70: neg w11, w1 > 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > > After: > > 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: @TobiHartmann's comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5403/files - new: https://git.openjdk.java.net/jdk/pull/5403/files/71aa6ac4..f9d7d612 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5403&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5403&range=03-04 Stats: 79 lines in 2 files changed: 27 ins; 37 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/5403.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5403/head:pull/5403 PR: https://git.openjdk.java.net/jdk/pull/5403 From zgu at openjdk.java.net Wed Sep 15 15:27:49 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 15 Sep 2021 15:27:49 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v6] In-Reply-To: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: > The transformation reduce instructions in generated code. > > ### x86_64: > > Before: > ``` > 0x00007fb92c78b3ac: neg %esi > 0x00007fb92c78b3ae: neg %edx > 0x00007fb92c78b3b0: mov %esi,%eax > 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > After: > > ; - TestSub::runSub at -1 (line 9) > 0x00007fc8c05b74ac: mov %esi,%eax > 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > > > ### AArch64: > Before: > > 0x0000ffff814b4a70: neg w11, w1 > 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > > After: > > 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Trailing space ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5403/files - new: https://git.openjdk.java.net/jdk/pull/5403/files/f9d7d612..3f3eeb01 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5403&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5403&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5403.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5403/head:pull/5403 PR: https://git.openjdk.java.net/jdk/pull/5403 From zgu at openjdk.java.net Wed Sep 15 15:48:53 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 15 Sep 2021 15:48:53 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v4] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: <_HJOTLr6faWxYHe_379NNk4L8DTHmzA2bj1k4QYHVVo=.83dae894-6eba-4bb9-8fcc-042c71739f5d@github.com> On Tue, 14 Sep 2021 06:49:50 GMT, Tobias Hartmann wrote: >> Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix node in place instead of creating new node > > src/hotspot/share/opto/mulnode.cpp line 70: > >> 68: Node *in2 = in(2); >> 69: if (in1->is_Sub() && in2->is_Sub()) { >> 70: Node* n11 = in1->in(1); > > For consistency with below code, I would name the local `in11` or simply use `phase->type(in1->in(1))` because it's the only user. Fixed > test/hotspot/jtreg/compiler/integerArithmetic/TestNegMultiply.java line 26: > >> 24: /** >> 25: * @test >> 26: * @bug 8273454 > > The test needs `* @key randomness` Done > test/hotspot/jtreg/compiler/integerArithmetic/TestNegMultiply.java line 36: > >> 34: >> 35: public class TestNegMultiply { >> 36: private static Random random = new Random(); > > You should use `Utils.getRandomInstance()` from `jdk.test.lib.Utils` to ensure that the seed is printed for reproducibility. You can check other tests for an example. Done ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From zgu at openjdk.java.net Wed Sep 15 15:48:53 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 15 Sep 2021 15:48:53 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v4] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> <-EYsHDdFMKiY7SVbmhnVJib76SspeVEaziPrs0OESVY=.e9b3a07e-5606-4fc6-8652-5e457476eeff@github.com> Message-ID: On Tue, 14 Sep 2021 06:58:57 GMT, Tobias Hartmann wrote: >> The default CompileThreshold is 10K when tiered compilation is disabled, which is the case here, so there is no risk. > > But why don't you compute `expected` as `a * b`? I would prefer to keep as it is to match testxxx() functions. I think it articulates that JIT-ed result matches interpreter's. ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From zgu at openjdk.java.net Wed Sep 15 15:55:52 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 15 Sep 2021 15:55:52 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v4] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Tue, 14 Sep 2021 07:05:40 GMT, Tobias Hartmann wrote: >> Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix node in place instead of creating new node > > test/hotspot/jtreg/compiler/integerArithmetic/TestNegMultiply.java line 45: > >> 43: private static void testInt(int a, int b) { >> 44: int expected = (-a) * (-b); >> 45: for (int i = 0; i < 20_000; i++) { > > Why do you need a second loop in here? It's sufficient to set `TEST_COUNT` high enough to trigger compilation. I would suggest something like this: > > > private static int testInt(int a, int b) { > return (-a) * (-b); > } > > private static void runIntTests() { > for (int i = 0; i < TEST_COUNT; i++) { > int a = random.nextInt(); > int b = random.nextInt(); > int res = testInt(a, b); > Asserts.assertEQ(a * b, res); > } > } > > > And then run with `-XX:CompileCommand=dontinline,TestNegMultiply::test*`. No need to disable OnStackReplacement. The inner loop ensures that all tests hit JIT-ed version. If the transformation is broken, I would prefer the test fails for the very first iteration, instead of somewhere in the middle. I refactored the code to remove inner loop. Also, fixed command option. ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From dlong at openjdk.java.net Wed Sep 15 21:27:56 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 15 Sep 2021 21:27:56 GMT Subject: RFR: 8273659: Replay compilation crashes with SIGSEGV since 8271911 In-Reply-To: References: <8vY79CHml4ijEgzlBJtMZqRrl4YKc60-cr4LGX4vX8A=.a2d2ff56-12a6-4afe-9d5f-b5ff08a18a64@github.com> Message-ID: On Wed, 15 Sep 2021 09:05:26 GMT, Roland Westrelin wrote: >> src/hotspot/share/ci/ciReplay.cpp line 445: >> >>> 443: if (k == NULL) { >>> 444: return NULL; >>> 445: } >> >> Do we need a call to report_error() for a descriptive message for these failure paths? > > parse_method() has calls to report_error() when it returns NULL. I think it's true of parse_klass() except if SymbolTable::new_symbol() returns NULL (but can it return NULL?). Do you think extra messages make sense then? No, I don't think new_symbol can return NULL. I think you're good to integrate. ------------- PR: https://git.openjdk.java.net/jdk/pull/5490 From yyang at openjdk.java.net Thu Sep 16 02:54:49 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Thu, 16 Sep 2021 02:54:49 GMT Subject: RFR: 8271202: C1: assert(false) failed: live_in set of first block must be empty In-Reply-To: References: Message-ID: On Wed, 28 Jul 2021 06:57:26 GMT, Yi Yang wrote: > Hi, I'm trying to fix [JDK-8271202](https://bugs.openjdk.java.net/browse/JDK-8271202). A local variable(smallinvoc) is defined in B3 and only used in B14, so it oughts to have a short lifetime. But its lifetime has been unconditionally extended since -XX:+DeoptimizeALot(**Just removing this may be also a simpler and safer fix? Not sure if it's acceptable**), making it propagate to almost the whole remaing IR. > > https://github.com/openjdk/jdk/blob/ecd445562f8355704a041f9eca0e87dc85a7f44c/src/hotspot/share/ci/ciMethod.cpp#L373-L379 > > ![image](https://user-images.githubusercontent.com/5010047/127277954-2a64d87e-2981-4d74-8001-c7efeb000a10.png) > > > A virtual register(v603) that represents this variable is located in B13 live_in set, which propagated to B1 live_out set. > > When B1 merges state with B16 and B19, it found that this variable in new_state(B16) was empty, so B1 invalidates the corresponding local slot. > > https://github.com/openjdk/jdk/blob/ecd445562f8355704a041f9eca0e87dc85a7f44c/src/hotspot/share/c1/c1_Instruction.cpp#L826-L838 > > I think we should invalidate this slot only when their types are mismatched. Otherwise, Phi will not be generated, B19 live_gen set will not contain this variable, because of which this variable is alive in B1 live_in. B1 live_in will eventually backward propagate to B20 live_in set, it avoids being killed by B19 live_gen, which causes the crash. > > > Block 1 > live_in: > 603 616 617 618 619 620 621 622 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > 620 > live_kill: > 648 649 650 > > Block 16 > live_in: > 603 616 617 618 619 620 621 622 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > 616 617 618 619 620 621 622 > live_kill: > 620 654 655 656 657 > > Block 19 > live_in: > 603 > live_out: > 603 616 617 618 619 620 621 622 > live_gen: > > live_kill: > 0 1 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 > > > Block 20 > live_in: > 603 > live_out: > 603 > live_gen: > > live_kill: > 577 578 Can I have a review for this fix? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/4916 From thartmann at openjdk.java.net Thu Sep 16 07:26:51 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 16 Sep 2021 07:26:51 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v4] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Wed, 15 Sep 2021 15:51:55 GMT, Zhengyu Gu wrote: >> test/hotspot/jtreg/compiler/integerArithmetic/TestNegMultiply.java line 45: >> >>> 43: private static void testInt(int a, int b) { >>> 44: int expected = (-a) * (-b); >>> 45: for (int i = 0; i < 20_000; i++) { >> >> Why do you need a second loop in here? It's sufficient to set `TEST_COUNT` high enough to trigger compilation. I would suggest something like this: >> >> >> private static int testInt(int a, int b) { >> return (-a) * (-b); >> } >> >> private static void runIntTests() { >> for (int i = 0; i < TEST_COUNT; i++) { >> int a = random.nextInt(); >> int b = random.nextInt(); >> int res = testInt(a, b); >> Asserts.assertEQ(a * b, res); >> } >> } >> >> >> And then run with `-XX:CompileCommand=dontinline,TestNegMultiply::test*`. No need to disable OnStackReplacement. > > The inner loop ensures that all tests hit JIT-ed version. If the transformation is broken, I would prefer the test fails for the very first iteration, instead of somewhere in the middle. > > I refactored the code to remove inner loop. > > Also, fixed command option. You can't control the iteration in which the test would fail if there's a bug in C2 (it could only fail for **some** random values). Therefore, you could as well use random values for the warmup and simply increase `TEST_COUNT` to ensure that C2 compilation is triggered and we run a reasonable amount of iterations with C2 compiled code. Your newest version of the test now has the problem that OSR compilation might C2 compile the computation of the expected value and then you are comparing the output of a C2 compiled method to a C2 compiled method instead of the interpreter. You have the following options: - Compute the expected value as `a * b`. In that case it's fine if the computation is C2 compiled as well. - Prevent compilation of the `run*` methods (either by disabling OSR compilation or by completely disabling compilation of these methods) ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From thartmann at openjdk.java.net Thu Sep 16 07:26:51 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 16 Sep 2021 07:26:51 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v4] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Thu, 16 Sep 2021 07:23:30 GMT, Tobias Hartmann wrote: >> The inner loop ensures that all tests hit JIT-ed version. If the transformation is broken, I would prefer the test fails for the very first iteration, instead of somewhere in the middle. >> >> I refactored the code to remove inner loop. >> >> Also, fixed command option. > > You can't control the iteration in which the test would fail if there's a bug in C2 (it could only fail for **some** random values). Therefore, you could as well use random values for the warmup and simply increase `TEST_COUNT` to ensure that C2 compilation is triggered and we run a reasonable amount of iterations with C2 compiled code. > > Your newest version of the test now has the problem that OSR compilation might C2 compile the computation of the expected value and then you are comparing the output of a C2 compiled method to a C2 compiled method instead of the interpreter. You have the following options: > - Compute the expected value as `a * b`. In that case it's fine if the computation is C2 compiled as well. > - Prevent compilation of the `run*` methods (either by disabling OSR compilation or by completely disabling compilation of these methods) And sorry for being picky here but I would like to keep tests as simple as possible :) ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From thartmann at openjdk.java.net Thu Sep 16 07:35:44 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 16 Sep 2021 07:35:44 GMT Subject: RFR: 8273807: Zero: Drop incorrect test block from compiler/startup/NumCompilerThreadsCheck.java In-Reply-To: References: Message-ID: On Wed, 15 Sep 2021 11:08:52 GMT, Aleksey Shipilev wrote: > There is a Zero-specific test block in compiler/startup/NumCompilerThreadsCheck.java, which currently fails: > > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=compiler/startup/NumCompilerThreadsCheck.java > ... > > STDERR: > stdout: []; > stderr: [intx CICompilerCount=-1 is outside the allowed range [ 0 ... 2147483647 ] > Improperly specified VM option 'CICompilerCount=-1' > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit. > ] > exitValue = 1 > > java.lang.RuntimeException: 'must be at least 0' missing from stdout/stderr > > at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:221) > at compiler.startup.NumCompilerThreadsCheck.main(NumCompilerThreadsCheck.java:53) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) > at java.base/java.lang.Thread.run(Thread.java:833) > > > I believe it was rendered obsolete by JDK-8122937. > > Additional testing: > - [x] Linux x86_64 Zero, affected test now passes Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5531 From thartmann at openjdk.java.net Thu Sep 16 07:44:49 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 16 Sep 2021 07:44:49 GMT Subject: RFR: 8273806: compiler/cpuflags/TestSSE4Disabled.java should test for CPU feature explicitly In-Reply-To: <3XhZAKna3x7cjYwB64G8tUj6_Yzj7HoBvfrAfgEzZWk=.b0f85e73-031a-44f1-9ffe-1c442f600aef@github.com> References: <3XhZAKna3x7cjYwB64G8tUj6_Yzj7HoBvfrAfgEzZWk=.b0f85e73-031a-44f1-9ffe-1c442f600aef@github.com> Message-ID: On Wed, 15 Sep 2021 10:59:12 GMT, Aleksey Shipilev wrote: > JDK-8158214 added a test that verifies that machines with SSE4 support do not crash when lower SSE level is required. But it tests for CPU capabilities weirdly. > > This _tangentially_ manifests when running the test with Zero: > > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=compiler/cpuflags/TestSSE4Disabled.java > ... > STDERR: > Unrecognized VM option 'UseSSE=3' > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit. > > > I think we can test that target CPU supports SSE4, and only run the test then. It would implicitly fix Zero test failure too, as Zero impersonates a "generic" featureless CPU. Plus, it would stop running the -Xcomp test on arches that do not actually need this test to run. > > Additional testing: > - [x] Linux x86_64 Server, TR 3970X, affected test still runs > - [x] Linux x86_64 Zero, TR 3970X, affected test is now skipped I just wanted to say "Looks good to me but the author of the test should also have a look" but then I noticed that I'm the author. So: Looks good and trivial to me :) ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5530 From shade at openjdk.java.net Thu Sep 16 08:16:49 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 16 Sep 2021 08:16:49 GMT Subject: RFR: 8273807: Zero: Drop incorrect test block from compiler/startup/NumCompilerThreadsCheck.java In-Reply-To: References: Message-ID: On Wed, 15 Sep 2021 11:08:52 GMT, Aleksey Shipilev wrote: > There is a Zero-specific test block in compiler/startup/NumCompilerThreadsCheck.java, which currently fails: > > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=compiler/startup/NumCompilerThreadsCheck.java > ... > > STDERR: > stdout: []; > stderr: [intx CICompilerCount=-1 is outside the allowed range [ 0 ... 2147483647 ] > Improperly specified VM option 'CICompilerCount=-1' > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit. > ] > exitValue = 1 > > java.lang.RuntimeException: 'must be at least 0' missing from stdout/stderr > > at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:221) > at compiler.startup.NumCompilerThreadsCheck.main(NumCompilerThreadsCheck.java:53) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) > at java.base/java.lang.Thread.run(Thread.java:833) > > > I believe it was rendered obsolete by JDK-8122937. > > Additional testing: > - [x] Linux x86_64 Zero, affected test now passes Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/5531 From shade at openjdk.java.net Thu Sep 16 08:16:49 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 16 Sep 2021 08:16:49 GMT Subject: Integrated: 8273807: Zero: Drop incorrect test block from compiler/startup/NumCompilerThreadsCheck.java In-Reply-To: References: Message-ID: On Wed, 15 Sep 2021 11:08:52 GMT, Aleksey Shipilev wrote: > There is a Zero-specific test block in compiler/startup/NumCompilerThreadsCheck.java, which currently fails: > > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=compiler/startup/NumCompilerThreadsCheck.java > ... > > STDERR: > stdout: []; > stderr: [intx CICompilerCount=-1 is outside the allowed range [ 0 ... 2147483647 ] > Improperly specified VM option 'CICompilerCount=-1' > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit. > ] > exitValue = 1 > > java.lang.RuntimeException: 'must be at least 0' missing from stdout/stderr > > at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:221) > at compiler.startup.NumCompilerThreadsCheck.main(NumCompilerThreadsCheck.java:53) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) > at java.base/java.lang.Thread.run(Thread.java:833) > > > I believe it was rendered obsolete by JDK-8122937. > > Additional testing: > - [x] Linux x86_64 Zero, affected test now passes This pull request has now been integrated. Changeset: 1c5de8b8 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/1c5de8b86b038f5d5c313c504a8868e36fc80bde Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod 8273807: Zero: Drop incorrect test block from compiler/startup/NumCompilerThreadsCheck.java Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5531 From mdoerr at openjdk.java.net Thu Sep 16 08:17:51 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 16 Sep 2021 08:17:51 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 10:05:16 GMT, Volker Simonis wrote: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. This looks like a great idea. I have a few minor remarks / suggestions. src/hotspot/share/ci/ciEnv.cpp line 375: > 373: // ------------------------------------------------------------------ > 374: // helper for -XX:+OptimizeImplicitExceptions > 375: ciInstanceKlass* ciEnv::exception_instanceKlass_for_reason(Deoptimization::DeoptReason reason, bool aastore) { Better `is_aastore` or pass Bytecode? src/hotspot/share/opto/graphKit.cpp line 631: > 629: Node* ex_node = new_instance(makecon(ex_type), NULL, NULL, true); > 630: set_argument(0, ex_node); > 631: ciMethod* init = ex_ciInstKlass->find_method(ciSymbol::make(""), ciSymbol::make("()V")); Extra whitespace. src/hotspot/share/runtime/globals.hpp line 645: > 643: "Omit backtraces for some 'hot' exceptions in optimized code") \ > 644: \ > 645: product(bool, OptimizeImplicitExceptions, true, \ Should it be a diagnostic flag? Regular product flags require a CSR. src/hotspot/share/runtime/sharedRuntime.cpp line 1096: > 1094: bc = bytecode.invoke_code(); > 1095: } > 1096: else { Coding style: newline before `else` ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From shade at openjdk.java.net Thu Sep 16 08:27:54 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 16 Sep 2021 08:27:54 GMT Subject: RFR: 8273806: compiler/cpuflags/TestSSE4Disabled.java should test for CPU feature explicitly In-Reply-To: References: <3XhZAKna3x7cjYwB64G8tUj6_Yzj7HoBvfrAfgEzZWk=.b0f85e73-031a-44f1-9ffe-1c442f600aef@github.com> Message-ID: On Thu, 16 Sep 2021 07:41:32 GMT, Tobias Hartmann wrote: > I just wanted to say "Looks good to me but the author of the test should also have a look" but then I noticed that I'm the author. So: Looks good and trivial to me :) Yeah ;) Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/5530 From shade at openjdk.java.net Thu Sep 16 08:27:55 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 16 Sep 2021 08:27:55 GMT Subject: Integrated: 8273806: compiler/cpuflags/TestSSE4Disabled.java should test for CPU feature explicitly In-Reply-To: <3XhZAKna3x7cjYwB64G8tUj6_Yzj7HoBvfrAfgEzZWk=.b0f85e73-031a-44f1-9ffe-1c442f600aef@github.com> References: <3XhZAKna3x7cjYwB64G8tUj6_Yzj7HoBvfrAfgEzZWk=.b0f85e73-031a-44f1-9ffe-1c442f600aef@github.com> Message-ID: On Wed, 15 Sep 2021 10:59:12 GMT, Aleksey Shipilev wrote: > JDK-8158214 added a test that verifies that machines with SSE4 support do not crash when lower SSE level is required. But it tests for CPU capabilities weirdly. > > This _tangentially_ manifests when running the test with Zero: > > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=compiler/cpuflags/TestSSE4Disabled.java > ... > STDERR: > Unrecognized VM option 'UseSSE=3' > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit. > > > I think we can test that target CPU supports SSE4, and only run the test then. It would implicitly fix Zero test failure too, as Zero impersonates a "generic" featureless CPU. Plus, it would stop running the -Xcomp test on arches that do not actually need this test to run. > > Additional testing: > - [x] Linux x86_64 Server, TR 3970X, affected test still runs > - [x] Linux x86_64 Zero, TR 3970X, affected test is now skipped This pull request has now been integrated. Changeset: 09ecb119 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/09ecb11927f0042ddc0c5c23d747b275ab70b36b Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8273806: compiler/cpuflags/TestSSE4Disabled.java should test for CPU feature explicitly Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5530 From chagedorn at openjdk.java.net Thu Sep 16 12:00:55 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 16 Sep 2021 12:00:55 GMT Subject: Integrated: 8271954: C2: assert(false) failed: Bad graph detected in build_loop_late In-Reply-To: References: Message-ID: <5LKRTV0GxezSuH4N3KBHVLVjA3oPccxSs20Mwkc0NIk=.8f78bb73-98f3-451c-8308-6d8afbd7bafb@github.com> On Thu, 19 Aug 2021 14:05:29 GMT, Christian Hagedorn wrote: > Since [JDK-8252372](https://bugs.openjdk.java.net/browse/JDK-8252372), data nodes are sunk out of a loop with the help of pinned `CastXX` nodes to ensure that the data nodes do not float back into the loop. Such a chain of sunk data nodes is now pinned to the uncommon projection of a loop predicate in the testcase. Loop unswitching is then applied and this loop predicate is cloned down to the fast and the slow loop in `PhaseIdealLoop::clone_predicates_to_unswitched_loop()`. Internally, it uses `PhaseIdealLoop::create_new_if_for_predicate()` to create the new `If` node and the uncommon projection to the uncommon trap region. The UCT region has a phi node from an earlier split-if optimization. The code in `create_new_if_for_predicate()` updates this phi by just setting the data node of the old uncommon projection path as input for the new uncommon projection path: > https://github.com/openjdk/jdk/blob/e8f1219d6f471c89fe15b19c56e3062dd668466f/src/hotspot/share/opto/loopPredicate.cpp#L196 > > It does this for both the slow and the fast loop. The graph looks like this afterwards (for method `testNoDiamond()`): > > ![wrong_update](https://user-images.githubusercontent.com/17833009/129887401-05ea8311-c420-444d-b7bb-e2480b7570b6.png) > > The data node `1226 AddI` is used for the new uncommon projection paths of the slow loop and the fast loop while still also being used for the old uncommon projection path of `986 IfFalse`. When later determining the earliest and latest control, we find that the early control is `986 IfFalse` for `1226 Addl` while the LCA is `376 IfTrue` (dominates `1347 IfFalse`, `1352 IfFalse` and `986 IfFalse`) which leads to the bad graph assertion failure. > > An easy way to fix this is to just set the control to the LCA when cloning the predicates down in loop unswitching but this would revert the work of JDK-8252372 as the nodes could float back into the loop. Therefore, I suggest to improve `create_new_if_for_predicate()` to clone the sunk data node chain to the slow loop and the fast loop. Since the old predicate will die anyways once IGVN is applied after the useless predicates are eliminated in `PhaseIdealLoop::eliminate_useless_predicates()`, I propose to only clone the sunk data node chain for the fast loop once and reuse the old data nodes for the slow loop. I replace the old uncommon projection data input into the phi by TOP which should be fine because the control will die. The graph looks like this after applying these changes: > > ![fixed_update](https://user-images.githubusercontent.com/17833009/129887382-5724745e-4b7a-49c2-ae06-5be559ae294f.png) > > The algorithm also handles diamonds in the sunk data node chains. > > Thanks, > Christian This pull request has now been integrated. Changeset: c86e24d4 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/c86e24d4be1e1a26a2a8323ef7ddbab6326bbf3a Stats: 442 lines in 3 files changed: 428 ins; 0 del; 14 mod 8271954: C2: assert(false) failed: Bad graph detected in build_loop_late Reviewed-by: roland, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5185 From roland at openjdk.java.net Thu Sep 16 12:06:00 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 16 Sep 2021 12:06:00 GMT Subject: RFR: 8273659: Replay compilation crashes with SIGSEGV since 8271911 In-Reply-To: References: <8vY79CHml4ijEgzlBJtMZqRrl4YKc60-cr4LGX4vX8A=.a2d2ff56-12a6-4afe-9d5f-b5ff08a18a64@github.com> Message-ID: On Mon, 13 Sep 2021 17:04:35 GMT, Vladimir Kozlov wrote: >> I noticed replay fails when it encounters classes that it can't >> resolve (when running with -XX:+ReplayIgnoreInitErrors). > > Good. @vnkozlov @dean-long thanks for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/5490 From roland at openjdk.java.net Thu Sep 16 12:06:01 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 16 Sep 2021 12:06:01 GMT Subject: Integrated: 8273659: Replay compilation crashes with SIGSEGV since 8271911 In-Reply-To: <8vY79CHml4ijEgzlBJtMZqRrl4YKc60-cr4LGX4vX8A=.a2d2ff56-12a6-4afe-9d5f-b5ff08a18a64@github.com> References: <8vY79CHml4ijEgzlBJtMZqRrl4YKc60-cr4LGX4vX8A=.a2d2ff56-12a6-4afe-9d5f-b5ff08a18a64@github.com> Message-ID: On Mon, 13 Sep 2021 11:31:24 GMT, Roland Westrelin wrote: > I noticed replay fails when it encounters classes that it can't > resolve (when running with -XX:+ReplayIgnoreInitErrors). This pull request has now been integrated. Changeset: 59b2478a Author: Roland Westrelin URL: https://git.openjdk.java.net/jdk/commit/59b2478abd7f233531262b0fa190e027a785da79 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod 8273659: Replay compilation crashes with SIGSEGV since 8271911 Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/5490 From github.com+6704669+asgibbons at openjdk.java.net Thu Sep 16 14:03:05 2021 From: github.com+6704669+asgibbons at openjdk.java.net (Scott Gibbons) Date: Thu, 16 Sep 2021 14:03:05 GMT Subject: RFR: 8273459: Update code segment alignment to 64 bytes Message-ID: <9_VCnv0v0ZrVLk3xKXDpsV-406yHK9iSEiagQVHmRhk=.f6375eb7-332c-4f82-b100-1f9db6b3d608@github.com> Change the default code entry alignment to 64 bytes from 32 bytes. This allows for maintaining proper 64-byte alignment of data within a code segment, which is required by several AVX-512 instructions. I ran into this while implementing Base64 encoding and decoding. Code segments which were allocated with the address mod 32 == 0 but with the address mod 64 != 0 would cause the align() macro to misalign. This is because the align macro aligns to the size of the code segment and not the offset of the PC. So align(64) would align the PC to a multiple of 64 bytes from the start of the segment, and not to a pure 64-byte boundary as requested. Changing the alignment of the segment to 64 bytes fixes the issue. I have not seen any measurable difference in either performance or memory usage with the tests I have run. See [this ](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054180.html) article for the discussion thread. ------------- Commit messages: - Change default code entry alignment to 64 bytes - Fix align() directive - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Fixing .gitignore Changes: https://git.openjdk.java.net/jdk/pull/5547/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5547&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273459 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5547.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5547/head:pull/5547 PR: https://git.openjdk.java.net/jdk/pull/5547 From chagedorn at openjdk.java.net Thu Sep 16 14:53:10 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 16 Sep 2021 14:53:10 GMT Subject: RFR: 8273895: compiler/ciReplay/TestVMNoCompLevel.java fails due to wrong data size with TieredStopAtLevel=2, 3 Message-ID: `TestVMNoCompLevel.java` is first letting the VM crash with `-XX:CICrashAt=1` (method `TestMain::test()`), then removes the compilation level information from the compile entry in the replay file and then replay compiles with and without tiered compilation. When running the replay file with tiered compilation off, it results in the assertion failure. When letting the VM crash with `-XX:TieredStopAtLevel=2,3` (C1 only), we get a slightly different MDO size for `TestMain::test()` compared to letting the VM crash with `-XX:-TieredCompilation` (C2 only). The C1 reported MDO for `TestMain::test()` is slightly smaller than the C2 MDO. The reason for that can be traced back to JDK-8251462 which changed [this code](https://github.com/openjdk/jdk/commit/15196325#diff-74ba139c0d6ec44945f1fc6d18d63e0d0fe0da5d38a5e347c6d4d38e0f7b112bL788-L790) in `is_speculative_trap_bytecode()`. This now only returns true if C2 is enabled. `is_speculative_trap_bytecode()` is called when initializing an MDO here: https://github.com/openjdk/jdk/blob/2ef6871118109b294e3148c8f15d4335039dd121/src/hotspot/share/oops/methodData.cpp#L1235 If C2 is enabled, then we reserve some extra data space for trap data. But when running with C1 only, this is no longer done since JDK-8251462, so we allocate no extra data space in the MDO for the crashing method `TestMain::test()`. When now crashing first with `-XX:TieredStopAtLevel=2,3`, we collect an MDO without the extra trap data for the replay file. But when replay compiling afterwards with `-XX:-TieredCompilation`, we try to compile it with C2 (we removed the compilation level from the compile entry). We initialize the MDO of `TestMain::test()` with the extra trap data which is a mismatch to the reported C1 only MDO without extra trap data and we fail with the assertion. I suggest to just not run this test with `-XX:TieredStopAtLevel=2,3` to not try to compile a C1 method with profiling data with C2 in order to avoid this MDO mismatch assertion failure. I'm not sure either of the value of this test as this old format is not created anymore. But we might still want to keep this test around. Thanks, Christian ------------- Commit messages: - 8273895: compiler/ciReplay/TestVMNoCompLevel.java fails due to wrong data size with TieredStopAtLevel=2,3 Changes: https://git.openjdk.java.net/jdk/pull/5548/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5548&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273895 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5548.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5548/head:pull/5548 PR: https://git.openjdk.java.net/jdk/pull/5548 From chagedorn at openjdk.java.net Thu Sep 16 15:01:04 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 16 Sep 2021 15:01:04 GMT Subject: RFR: 8268744: Improve sinking algorithm in partial peeling to avoid redundant clones In-Reply-To: References: Message-ID: <9Gtf209gFvQINDZFIy9tTrfKYT0Q3LMfRxod4bW3DU0=.0e65f8ed-615e-4edb-a7f4-0e2beea42460@github.com> On Wed, 28 Jul 2021 15:46:13 GMT, Christian Hagedorn wrote: > The algorithm in step 2 in partial peeling to move data nodes from the peel section to the non-peel section uses a straight forward cloning algorithm which creates redundant clones when the IR contains one ore more diamonds of data nodes to be cloned. The number of clones grows exponentially which could lead to a bailout (added by [JDK-8256934](https://bugs.openjdk.java.net/browse/JDK-8256934) for JDK 17 with a testcase). This RFE improves this algorithm to handle node diamonds more efficiently to avoid unnecessary cloning. The testcase for JDK-8256934 does not bail out anymore and uses only few clones. > > The main idea is to first find all outside of the loop uses `u` of the nodes in the initial peel region to be moved into the non-peel region. We then only need to clone any data node during the algorithm at most `u` times, once for each initial outside of the loop use. If we process a node diamond (following inputs), we can use an already cloned node for the top node of the diamond (node A in the example below). An example with 1 initial outside of the loop use and 4 nodes to be cloned, forming a diamond, is shown as comment in the code: > https://github.com/openjdk/jdk/blob/8ae0e1a06558a1678521dcb4ed32708a1821b47d/src/hotspot/share/opto/loopopts.cpp#L3605-L3635 > > The algorithm is explained in more details in the comments in the code (starting in method `move_nodes_to_not_peel()`). > > I also cleaned up the code for step 2 of partial peeling. I left the bailout code added by JDK-8256934 in place which I think is still required if we enter partial peeling with a huge number of live nodes (quite rare though). > > I additionally ran some standard benchmarks which did not show any improvements but also no regressions. I think it is rather an edge case where the old algorithm creates a huge number of redundant clones. Nevertheless, I think this improved algorithm is still worth to have to handle the more uncommon case of node diamonds. What do you think? > > Thanks, > Christian Ping - anyone for this? Thanks, Christian ------------- PR: https://git.openjdk.java.net/jdk/pull/4923 From jbhateja at openjdk.java.net Thu Sep 16 15:58:48 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 16 Sep 2021 15:58:48 GMT Subject: RFR: 8273459: Update code segment alignment to 64 bytes In-Reply-To: <9_VCnv0v0ZrVLk3xKXDpsV-406yHK9iSEiagQVHmRhk=.f6375eb7-332c-4f82-b100-1f9db6b3d608@github.com> References: <9_VCnv0v0ZrVLk3xKXDpsV-406yHK9iSEiagQVHmRhk=.f6375eb7-332c-4f82-b100-1f9db6b3d608@github.com> Message-ID: On Thu, 16 Sep 2021 13:52:24 GMT, Scott Gibbons wrote: > Change the default code entry alignment to 64 bytes from 32 bytes. This allows for maintaining proper 64-byte alignment of data within a code segment, which is required by several AVX-512 instructions. > > I ran into this while implementing Base64 encoding and decoding. Code segments which were allocated with the address mod 32 == 0 but with the address mod 64 != 0 would cause the align() macro to misalign. This is because the align macro aligns to the size of the code segment and not the offset of the PC. So align(64) would align the PC to a multiple of 64 bytes from the start of the segment, and not to a pure 64-byte boundary as requested. Changing the alignment of the segment to 64 bytes fixes the issue. > > I have not seen any measurable difference in either performance or memory usage with the tests I have run. > > See [this ](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054180.html) article for the discussion thread. src/hotspot/cpu/x86/globals_x86.hpp line 47: > 45: // only assured that the entry instruction meets the 5 byte size requirement. > 46: #if COMPILER2_OR_JVMCI > 47: define_pd_global(intx, CodeEntryAlignment, 64); How about setting dynamic alignment based on MaxVectorSize ? i.e. match the alignment to MaxVectorSize. This way we can save internal fragmentations over AVX2. It could be done during VM Initialization similar to following code. https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L1439 ------------- PR: https://git.openjdk.java.net/jdk/pull/5547 From erikj at openjdk.java.net Thu Sep 16 16:11:47 2021 From: erikj at openjdk.java.net (Erik Joelsson) Date: Thu, 16 Sep 2021 16:11:47 GMT Subject: RFR: 8273459: Update code segment alignment to 64 bytes In-Reply-To: <9_VCnv0v0ZrVLk3xKXDpsV-406yHK9iSEiagQVHmRhk=.f6375eb7-332c-4f82-b100-1f9db6b3d608@github.com> References: <9_VCnv0v0ZrVLk3xKXDpsV-406yHK9iSEiagQVHmRhk=.f6375eb7-332c-4f82-b100-1f9db6b3d608@github.com> Message-ID: On Thu, 16 Sep 2021 13:52:24 GMT, Scott Gibbons wrote: > Change the default code entry alignment to 64 bytes from 32 bytes. This allows for maintaining proper 64-byte alignment of data within a code segment, which is required by several AVX-512 instructions. > > I ran into this while implementing Base64 encoding and decoding. Code segments which were allocated with the address mod 32 == 0 but with the address mod 64 != 0 would cause the align() macro to misalign. This is because the align macro aligns to the size of the code segment and not the offset of the PC. So align(64) would align the PC to a multiple of 64 bytes from the start of the segment, and not to a pure 64-byte boundary as requested. Changing the alignment of the segment to 64 bytes fixes the issue. > > I have not seen any measurable difference in either performance or memory usage with the tests I have run. > > See [this ](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054180.html) article for the discussion thread. .gitignore line 19: > 17: **/JTwork/** > 18: /src/utils/LogCompilation/target/ > 19: **.project** Changes to .gitignore should probably be made separately. ------------- PR: https://git.openjdk.java.net/jdk/pull/5547 From simonis at openjdk.java.net Thu Sep 16 16:49:46 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 16 Sep 2021 16:49:46 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 07:59:08 GMT, Martin Doerr wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > src/hotspot/share/ci/ciEnv.cpp line 375: > >> 373: // ------------------------------------------------------------------ >> 374: // helper for -XX:+OptimizeImplicitExceptions >> 375: ciInstanceKlass* ciEnv::exception_instanceKlass_for_reason(Deoptimization::DeoptReason reason, bool aastore) { > > Better `is_aastore` or pass Bytecode? I didn't wanted to unnecessarily include `interpreter/bytecodes.hpp` into `ciEnv.hpp`. But `is_aastore` is a good suggestion. Changed as suggested. > src/hotspot/share/opto/graphKit.cpp line 631: > >> 629: Node* ex_node = new_instance(makecon(ex_type), NULL, NULL, true); >> 630: set_argument(0, ex_node); >> 631: ciMethod* init = ex_ciInstKlass->find_method(ciSymbol::make(""), ciSymbol::make("()V")); > > Extra whitespace. Fixed. > src/hotspot/share/runtime/globals.hpp line 645: > >> 643: "Omit backtraces for some 'hot' exceptions in optimized code") \ >> 644: \ >> 645: product(bool, OptimizeImplicitExceptions, true, \ > > Should it be a diagnostic flag? Regular product flags require a CSR. Good point. Changed as suggested. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Sep 16 16:53:47 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 16 Sep 2021 16:53:47 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: <60TYPMSYj8Dpx7vT30IbkRxsiImj8eXKCYG0IEIgH-4=.5b3b97e2-0d60-456f-b4b1-a90f1efd91d3@github.com> On Thu, 16 Sep 2021 08:06:28 GMT, Martin Doerr wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > src/hotspot/share/runtime/sharedRuntime.cpp line 1096: > >> 1094: bc = bytecode.invoke_code(); >> 1095: } >> 1096: else { > > Coding style: newline before `else` Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Sep 16 17:00:22 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 16 Sep 2021 17:00:22 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: <_dei2-IYvM72RVRyk6GfMR8fdwviFvWnQJMzhP3IRtI=.be093912-5b40-4ac2-801e-416529033f50@github.com> On Mon, 13 Sep 2021 10:05:16 GMT, Volker Simonis wrote: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Hi Martin, thanks a lot for looking at my change. I've applied all your suggestions to my PR. With best regards, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Sep 16 17:00:20 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 16 Sep 2021 17:00:20 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Minor updates as requested by @TheRealMDoerr ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5488/files - new: https://git.openjdk.java.net/jdk/pull/5488/files/0558c3e1..f14338a7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=00-01 Stats: 7 lines in 5 files changed: 0 ins; 1 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From kvn at openjdk.java.net Thu Sep 16 17:02:45 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 16 Sep 2021 17:02:45 GMT Subject: RFR: 8273895: compiler/ciReplay/TestVMNoCompLevel.java fails due to wrong data size with TieredStopAtLevel=2, 3 In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 14:43:43 GMT, Christian Hagedorn wrote: > `TestVMNoCompLevel.java` is first letting the VM crash with `-XX:CICrashAt=1` (method `TestMain::test()`), then removes the compilation level information from the compile entry in the replay file and then replay compiles with and without tiered compilation. When running the replay file with tiered compilation off, it results in the assertion failure. > > When letting the VM crash with `-XX:TieredStopAtLevel=2,3` (C1 only), we get a slightly different MDO size for `TestMain::test()` compared to letting the VM crash with `-XX:-TieredCompilation` (C2 only). The C1 reported MDO for `TestMain::test()` is slightly smaller than the C2 MDO. The reason for that can be traced back to JDK-8251462 which changed [this code](https://github.com/openjdk/jdk/commit/15196325#diff-74ba139c0d6ec44945f1fc6d18d63e0d0fe0da5d38a5e347c6d4d38e0f7b112bL788-L790) in `is_speculative_trap_bytecode()`. This now only returns true if C2 is enabled. `is_speculative_trap_bytecode()` is called when initializing an MDO here: > https://github.com/openjdk/jdk/blob/2ef6871118109b294e3148c8f15d4335039dd121/src/hotspot/share/oops/methodData.cpp#L1235 > > If C2 is enabled, then we reserve some extra data space for trap data. But when running with C1 only, this is no longer done since JDK-8251462, so we allocate no extra data space in the MDO for the crashing method `TestMain::test()`. > > When now crashing first with `-XX:TieredStopAtLevel=2,3`, we collect an MDO without the extra trap data for the replay file. But when replay compiling afterwards with `-XX:-TieredCompilation`, we try to compile it with C2 (we removed the compilation level from the compile entry). We initialize the MDO of `TestMain::test()` with the extra trap data which is a mismatch to the reported C1 only MDO without extra trap data and we fail with the assertion. > > I suggest to just not run this test with `-XX:TieredStopAtLevel=2,3` to not try to compile a C1 method with profiling data with C2 in order to avoid this MDO mismatch assertion failure. I'm not sure either of the value of this test as this old format is not created anymore. But we might still want to keep this test around. > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5548 From kvn at openjdk.java.net Thu Sep 16 17:27:48 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 16 Sep 2021 17:27:48 GMT Subject: RFR: 8273459: Update code segment alignment to 64 bytes In-Reply-To: <9_VCnv0v0ZrVLk3xKXDpsV-406yHK9iSEiagQVHmRhk=.f6375eb7-332c-4f82-b100-1f9db6b3d608@github.com> References: <9_VCnv0v0ZrVLk3xKXDpsV-406yHK9iSEiagQVHmRhk=.f6375eb7-332c-4f82-b100-1f9db6b3d608@github.com> Message-ID: On Thu, 16 Sep 2021 13:52:24 GMT, Scott Gibbons wrote: > Change the default code entry alignment to 64 bytes from 32 bytes. This allows for maintaining proper 64-byte alignment of data within a code segment, which is required by several AVX-512 instructions. > > I ran into this while implementing Base64 encoding and decoding. Code segments which were allocated with the address mod 32 == 0 but with the address mod 64 != 0 would cause the align() macro to misalign. This is because the align macro aligns to the size of the code segment and not the offset of the PC. So align(64) would align the PC to a multiple of 64 bytes from the start of the segment, and not to a pure 64-byte boundary as requested. Changing the alignment of the segment to 64 bytes fixes the issue. > > I have not seen any measurable difference in either performance or memory usage with the tests I have run. > > See [this ](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054180.html) article for the discussion thread. I share Dean's concern from the discussion before. `CodeEntryAlignment` is used in a lot of places and we have to be careful about changes to it. I found only 7 cases with `align(64)`: src/hotspot/cpu/x86//stubGenerator_x86_64.cpp: __ align(64); src/hotspot/cpu/x86//stubGenerator_x86_64.cpp: __ align(64); src/hotspot/cpu/x86//stubGenerator_x86_64.cpp: __ align(64); src/hotspot/cpu/x86//stubGenerator_x86_64.cpp: __ align(64); src/hotspot/cpu/x86//stubGenerator_x86_32.cpp: __ align(64); src/hotspot/cpu/x86//stubGenerator_x86_32.cpp: __ align(64); src/hotspot/cpu/x86//stubGenerator_x86_32.cpp: __ align(64); It does not justify such general changes. May suggestion would be to add new `align64()` method to call pc() as you suggested in original proposal. New function may also use `MaxVectorSize` as Jatin proposed if it helps. ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5547 From zgu at openjdk.java.net Thu Sep 16 18:28:20 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 16 Sep 2021 18:28:20 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v7] In-Reply-To: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: > The transformation reduce instructions in generated code. > > ### x86_64: > > Before: > ``` > 0x00007fb92c78b3ac: neg %esi > 0x00007fb92c78b3ae: neg %edx > 0x00007fb92c78b3b0: mov %esi,%eax > 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > After: > > ; - TestSub::runSub at -1 (line 9) > 0x00007fc8c05b74ac: mov %esi,%eax > 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > > > ### AArch64: > Before: > > 0x0000ffff814b4a70: neg w11, w1 > 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > > After: > > 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: @TobiHartmann's comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5403/files - new: https://git.openjdk.java.net/jdk/pull/5403/files/3f3eeb01..57d1ecf2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5403&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5403&range=05-06 Stats: 20 lines in 1 file changed: 1 ins; 17 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5403.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5403/head:pull/5403 PR: https://git.openjdk.java.net/jdk/pull/5403 From zgu at openjdk.java.net Thu Sep 16 18:28:20 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 16 Sep 2021 18:28:20 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v4] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Thu, 16 Sep 2021 07:24:49 GMT, Tobias Hartmann wrote: >> You can't control the iteration in which the test would fail if there's a bug in C2 (it could only fail for **some** random values). Therefore, you could as well use random values for the warmup and simply increase `TEST_COUNT` to ensure that C2 compilation is triggered and we run a reasonable amount of iterations with C2 compiled code. >> >> Your newest version of the test now has the problem that OSR compilation might C2 compile the computation of the expected value and then you are comparing the output of a C2 compiled method to a C2 compiled method instead of the interpreter. You have the following options: >> - Compute the expected value as `a * b`. In that case it's fine if the computation is C2 compiled as well. >> - Prevent compilation of the `run*` methods (either by disabling OSR compilation or by completely disabling compilation of these methods) > > And sorry for being picky here but I would like to keep tests as simple as possible :) Fixed according to you comments. I really appreciate you suggestions, thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From github.com+10835776+stsypanov at openjdk.java.net Thu Sep 16 19:10:43 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Thu, 16 Sep 2021 19:10:43 GMT Subject: RFR: 8268764: Use Long.hashCode() instead of int-cast where applicable [v4] In-Reply-To: References: Message-ID: On Thu, 1 Jul 2021 12:19:53 GMT, ?????? ??????? wrote: >> In some JDK classes there's still the following hashCode() implementation: >> >> long objNum; >> >> public int hashCode() { >> return (int) objNum; >> } >> >> This outdated expression should be replaced with Long.hashCode(long) as it >> >> - uses all bits of the original value, does not discard any information upfront. For example, depending on how you are generating the IDs, the upper bits could change more frequently (or the opposite). >> >> - does not introduce any bias towards values with more ones (zeros), as it would be the case if the two halves were combined with an OR (AND) operation. >> >> See https://stackoverflow.com/a/4045083 >> >> This is related to https://github.com/openjdk/jdk/pull/4309 > > ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: > > 8268764: Update copy-right year Let's wait, bridgekeeper. ------------- PR: https://git.openjdk.java.net/jdk/pull/4491 From dlong at openjdk.java.net Fri Sep 17 01:54:47 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 17 Sep 2021 01:54:47 GMT Subject: RFR: 8273895: compiler/ciReplay/TestVMNoCompLevel.java fails due to wrong data size with TieredStopAtLevel=2, 3 In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 14:43:43 GMT, Christian Hagedorn wrote: > `TestVMNoCompLevel.java` is first letting the VM crash with `-XX:CICrashAt=1` (method `TestMain::test()`), then removes the compilation level information from the compile entry in the replay file and then replay compiles with and without tiered compilation. When running the replay file with tiered compilation off, it results in the assertion failure. > > When letting the VM crash with `-XX:TieredStopAtLevel=2,3` (C1 only), we get a slightly different MDO size for `TestMain::test()` compared to letting the VM crash with `-XX:-TieredCompilation` (C2 only). The C1 reported MDO for `TestMain::test()` is slightly smaller than the C2 MDO. The reason for that can be traced back to JDK-8251462 which changed [this code](https://github.com/openjdk/jdk/commit/15196325#diff-74ba139c0d6ec44945f1fc6d18d63e0d0fe0da5d38a5e347c6d4d38e0f7b112bL788-L790) in `is_speculative_trap_bytecode()`. This now only returns true if C2 is enabled. `is_speculative_trap_bytecode()` is called when initializing an MDO here: > https://github.com/openjdk/jdk/blob/2ef6871118109b294e3148c8f15d4335039dd121/src/hotspot/share/oops/methodData.cpp#L1235 > > If C2 is enabled, then we reserve some extra data space for trap data. But when running with C1 only, this is no longer done since JDK-8251462, so we allocate no extra data space in the MDO for the crashing method `TestMain::test()`. > > When now crashing first with `-XX:TieredStopAtLevel=2,3`, we collect an MDO without the extra trap data for the replay file. But when replay compiling afterwards with `-XX:-TieredCompilation`, we try to compile it with C2 (we removed the compilation level from the compile entry). We initialize the MDO of `TestMain::test()` with the extra trap data which is a mismatch to the reported C1 only MDO without extra trap data and we fail with the assertion. > > I suggest to just not run this test with `-XX:TieredStopAtLevel=2,3` to not try to compile a C1 method with profiling data with C2 in order to avoid this MDO mismatch assertion failure. I'm not sure either of the value of this test as this old format is not created anymore. But we might still want to keep this test around. > > Thanks, > Christian Marked as reviewed by dlong (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5548 From chagedorn at openjdk.java.net Fri Sep 17 05:46:42 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 17 Sep 2021 05:46:42 GMT Subject: RFR: 8273895: compiler/ciReplay/TestVMNoCompLevel.java fails due to wrong data size with TieredStopAtLevel=2, 3 In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 14:43:43 GMT, Christian Hagedorn wrote: > `TestVMNoCompLevel.java` is first letting the VM crash with `-XX:CICrashAt=1` (method `TestMain::test()`), then removes the compilation level information from the compile entry in the replay file and then replay compiles with and without tiered compilation. When running the replay file with tiered compilation off, it results in the assertion failure. > > When letting the VM crash with `-XX:TieredStopAtLevel=2,3` (C1 only), we get a slightly different MDO size for `TestMain::test()` compared to letting the VM crash with `-XX:-TieredCompilation` (C2 only). The C1 reported MDO for `TestMain::test()` is slightly smaller than the C2 MDO. The reason for that can be traced back to JDK-8251462 which changed [this code](https://github.com/openjdk/jdk/commit/15196325#diff-74ba139c0d6ec44945f1fc6d18d63e0d0fe0da5d38a5e347c6d4d38e0f7b112bL788-L790) in `is_speculative_trap_bytecode()`. This now only returns true if C2 is enabled. `is_speculative_trap_bytecode()` is called when initializing an MDO here: > https://github.com/openjdk/jdk/blob/2ef6871118109b294e3148c8f15d4335039dd121/src/hotspot/share/oops/methodData.cpp#L1235 > > If C2 is enabled, then we reserve some extra data space for trap data. But when running with C1 only, this is no longer done since JDK-8251462, so we allocate no extra data space in the MDO for the crashing method `TestMain::test()`. > > When now crashing first with `-XX:TieredStopAtLevel=2,3`, we collect an MDO without the extra trap data for the replay file. But when replay compiling afterwards with `-XX:-TieredCompilation`, we try to compile it with C2 (we removed the compilation level from the compile entry). We initialize the MDO of `TestMain::test()` with the extra trap data which is a mismatch to the reported C1 only MDO without extra trap data and we fail with the assertion. > > I suggest to just not run this test with `-XX:TieredStopAtLevel=2,3` to not try to compile a C1 method with profiling data with C2 in order to avoid this MDO mismatch assertion failure. I'm not sure either of the value of this test as this old format is not created anymore. But we might still want to keep this test around. > > Thanks, > Christian Thanks Vladimir and Dean for your reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5548 From chagedorn at openjdk.java.net Fri Sep 17 06:32:52 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 17 Sep 2021 06:32:52 GMT Subject: RFR: 8273825: TestIRMatching.java fails after JDK-8266550 Message-ID: [JDK-8266550](https://bugs.openjdk.java.net/browse/JDK-8266550) changed the class hierarchy of types and thus had to adapt some default IR regexes. In this process, an additional "klass" + ";" was missed to remove in `CHECKCAST_ARRAY_OF` for the part matching platforms such as aarch64 or PPC. Thanks, Christian ------------- Commit messages: - 8273825: TestIRMatching.java fails after JDK-8266550 on aarch64 Changes: https://git.openjdk.java.net/jdk/pull/5555/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5555&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273825 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5555.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5555/head:pull/5555 PR: https://git.openjdk.java.net/jdk/pull/5555 From thartmann at openjdk.java.net Fri Sep 17 06:38:45 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 17 Sep 2021 06:38:45 GMT Subject: RFR: 8273825: TestIRMatching.java fails after JDK-8266550 In-Reply-To: References: Message-ID: <6uCJiqV9vge22TU7kLB72TMwHBKpOE_HCLtk2-gQDYE=.7df80879-7af6-4a9e-980b-6106f8045426@github.com> On Fri, 17 Sep 2021 06:24:54 GMT, Christian Hagedorn wrote: > [JDK-8266550](https://bugs.openjdk.java.net/browse/JDK-8266550) changed the class hierarchy of types and thus had to adapt some default IR regexes. In this process, an additional "klass" + ";" was missed to remove in `CHECKCAST_ARRAY_OF` for the part matching platforms such as aarch64 or PPC. > > Thanks, > Christian Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5555 From roland at openjdk.java.net Fri Sep 17 06:44:54 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Fri, 17 Sep 2021 06:44:54 GMT Subject: RFR: 8273825: TestIRMatching.java fails after JDK-8266550 In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 06:24:54 GMT, Christian Hagedorn wrote: > [JDK-8266550](https://bugs.openjdk.java.net/browse/JDK-8266550) changed the class hierarchy of types and thus had to adapt some default IR regexes. In this process, an additional "klass" + ";" was missed to remove in `CHECKCAST_ARRAY_OF` for the part matching platforms such as aarch64 or PPC. > > Thanks, > Christian Looks good. Thanks for fixing. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5555 From chagedorn at openjdk.java.net Fri Sep 17 06:44:54 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 17 Sep 2021 06:44:54 GMT Subject: RFR: 8273825: TestIRMatching.java fails after JDK-8266550 In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 06:24:54 GMT, Christian Hagedorn wrote: > [JDK-8266550](https://bugs.openjdk.java.net/browse/JDK-8266550) changed the class hierarchy of types and thus had to adapt some default IR regexes. In this process, an additional "klass" + ";" was missed to remove in `CHECKCAST_ARRAY_OF` for the part matching platforms such as aarch64 or PPC. > > Thanks, > Christian Thanks Tobias and Roland for your reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5555 From njian at openjdk.java.net Fri Sep 17 06:53:06 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Fri, 17 Sep 2021 06:53:06 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v7] In-Reply-To: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: > This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge with master - Merge with master - More comments from Andrew. - Add missing part - Address Andrew's comments - 8267356: AArch64: Vector API SVE codegen support This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: 1. Code generation for Vector API c2 IR nodes with SVE. 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. 3. Some more SVE assemblers (and tests) used by the codegen part. Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. ------------- Changes: https://git.openjdk.java.net/jdk/pull/4122/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4122&range=06 Stats: 5761 lines in 13 files changed: 4576 ins; 195 del; 990 mod Patch: https://git.openjdk.java.net/jdk/pull/4122.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4122/head:pull/4122 PR: https://git.openjdk.java.net/jdk/pull/4122 From wuyan at openjdk.java.net Fri Sep 17 07:11:51 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Fri, 17 Sep 2021 07:11:51 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v5] In-Reply-To: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> References: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> Message-ID: On Tue, 7 Sep 2021 10:11:08 GMT, Wang Huang wrote: >> * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. >> * It may be a solver of JDK-8269866, or part of it. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix bugs Hi, we have fixed these use cases, are there any other questions? ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From wuyan at openjdk.java.net Fri Sep 17 07:16:42 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Fri, 17 Sep 2021 07:16:42 GMT Subject: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6] In-Reply-To: References: <_XL6WybKwHeJ54kSQnN_q0_NgvR7ib9BFjNJ4HrkO_g=.f82e6cda-b31f-4eee-9185-3e52ebd6b54d@github.com> Message-ID: On Sun, 5 Sep 2021 13:23:21 GMT, Andrew Haley wrote: >> Thanks, I'll fix it. > > It's fine. I don't think it'll affect any real programs, so it's rather pointless. I don't know if that's any reason not to approve it. Andrew, can you help us to approve this? ------------- PR: https://git.openjdk.java.net/jdk/pull/4722 From njian at openjdk.java.net Fri Sep 17 07:28:52 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Fri, 17 Sep 2021 07:28:52 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v7] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Fri, 17 Sep 2021 06:53:06 GMT, Ningsheng Jian wrote: >> This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: >> >> 1. Code generation for Vector API c2 IR nodes with SVE. >> 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. >> 3. Some more SVE assemblers (and tests) used by the codegen part. >> >> Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask >> >> >> Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. > > Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge with master > - Merge with master > - More comments from Andrew. > - Add missing part > - Address Andrew's comments > - 8267356: AArch64: Vector API SVE codegen support > > This is the integration of current SVE work done in > panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on > 256-bit SVE environment could also generate optimized SVE > instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further > improvement to map mask to predicate register is under development at > https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware > with MaxVectorSize=16/32/64. Merged with master and tested. Thanks to Andrew for the review! Can I get one more view? This is part of https://bugs.openjdk.java.net/browse/JDK-8271515, but can be integrated separately once the JEP has been targeted. ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From thartmann at openjdk.java.net Fri Sep 17 07:45:44 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 17 Sep 2021 07:45:44 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v7] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Thu, 16 Sep 2021 18:28:20 GMT, Zhengyu Gu wrote: >> The transformation reduce instructions in generated code. >> >> ### x86_64: >> >> Before: >> ``` >> 0x00007fb92c78b3ac: neg %esi >> 0x00007fb92c78b3ae: neg %edx >> 0x00007fb92c78b3b0: mov %esi,%eax >> 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> After: >> >> ; - TestSub::runSub at -1 (line 9) >> 0x00007fc8c05b74ac: mov %esi,%eax >> 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> >> ### AArch64: >> Before: >> >> 0x0000ffff814b4a70: neg w11, w1 >> 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> After: >> >> 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > @TobiHartmann's comments Thanks for making these changes, looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5403 From chagedorn at openjdk.java.net Fri Sep 17 08:11:48 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 17 Sep 2021 08:11:48 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v7] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Thu, 16 Sep 2021 18:28:20 GMT, Zhengyu Gu wrote: >> The transformation reduce instructions in generated code. >> >> ### x86_64: >> >> Before: >> ``` >> 0x00007fb92c78b3ac: neg %esi >> 0x00007fb92c78b3ae: neg %edx >> 0x00007fb92c78b3b0: mov %esi,%eax >> 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> After: >> >> ; - TestSub::runSub at -1 (line 9) >> 0x00007fc8c05b74ac: mov %esi,%eax >> 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> >> ### AArch64: >> Before: >> >> 0x0000ffff814b4a70: neg w11, w1 >> 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) >> >> >> After: >> >> 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} >> ; - TestSub::runSub at 4 (line 9) > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > @TobiHartmann's comments Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5403 From shade at openjdk.java.net Fri Sep 17 08:46:00 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 17 Sep 2021 08:46:00 GMT Subject: RFR: 8273927: Enable hsdis for riscv64 Message-ID: Currently compiled `hsdis-riscv64.so` binary complains: hsdis: bad native mach=architecture not set in Makefile!; please port hsdis to this platform It seems to be as simple as point to the right BFD arch. Additional testing: - [x] Linux RISC-V port, `-XX:+PrintAssembly` works ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/5558/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5558&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273927 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5558.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5558/head:pull/5558 PR: https://git.openjdk.java.net/jdk/pull/5558 From neliasso at openjdk.java.net Fri Sep 17 09:34:10 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 17 Sep 2021 09:34:10 GMT Subject: RFR: 8273934: Remove unused perfcounters Message-ID: This patch removes two unused PerfCounters. Please review, Regards, Nils Eliasson ------------- Commit messages: - Update compileBroker.hpp - Remove unused PerfCounters Changes: https://git.openjdk.java.net/jdk/pull/5093/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5093&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273934 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5093.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5093/head:pull/5093 PR: https://git.openjdk.java.net/jdk/pull/5093 From neliasso at openjdk.java.net Fri Sep 17 09:47:53 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 17 Sep 2021 09:47:53 GMT Subject: RFR: 8273933: [TESTBUG] Test must run without preallocated exceptions Message-ID: Executing vmTestbase/jit/t/t105/t105.java with the fix for (JDK-8273277) makes the test fail when run with the following arguments: -XX:+TieredCompilation -XX:Tier0BackedgeNotifyFreqLog=0 -XX:Tier2BackedgeNotifyFreqLog=0 -XX:Tier3BackedgeNotifyFreqLog=0 -XX:Tier2BackEdgeThreshold=1 -XX:Tier3BackEdgeThreshold=1 -XX:Tier4BackEdgeThreshold=1 -Xbatch The problem is that the tests expects a detailed message from ArrayIndexOutOfBoundsException, but this test will trigger the optimization that reuses preallocated exceptions that have an empty detailed exceptions. It is wrong for the test to assume exceptions messages. Solution disable preallocated exceptions with the flag -XX:-ProfileTraps. ------------- Commit messages: - Disable ProfileTraps Changes: https://git.openjdk.java.net/jdk/pull/5560/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5560&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273933 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5560.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5560/head:pull/5560 PR: https://git.openjdk.java.net/jdk/pull/5560 From chagedorn at openjdk.java.net Fri Sep 17 09:58:42 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 17 Sep 2021 09:58:42 GMT Subject: RFR: 8273934: Remove unused perfcounters In-Reply-To: References: Message-ID: On Wed, 11 Aug 2021 20:48:46 GMT, Nils Eliasson wrote: > This patch removes two unused PerfCounters. > > Please review, > Regards, > Nils Eliasson Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5093 From njian at openjdk.java.net Fri Sep 17 10:07:49 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Fri, 17 Sep 2021 10:07:49 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v5] In-Reply-To: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> References: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> Message-ID: On Tue, 7 Sep 2021 10:11:08 GMT, Wang Huang wrote: >> * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. >> * It may be a solver of JDK-8269866, or part of it. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix bugs src/hotspot/cpu/aarch64/aarch64.ad line 2445: > 2443: case Op_VectorCastB2X: > 2444: case Op_VectorCastS2X: > 2445: if (vlen < 4) { The vector_size_supported() check should already cover this and no need to check it here? src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 280: > 278: match(Set dst (VectorCast$2`'2X src)); > 279: format %{ "fcvtzs $dst, T$6, $src\n\t" > 280: "xtn $dst, T$7, $dst, T$6\n\t# convert $1$2 to $1$3 vector" "\n\t" --> "\t" at the last line of the block. src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 298: > 296: format %{ "fcvtzs $dst, T4S, $src\n\t" > 297: "xtn $dst, T4H, $dst, T4S\n\t" > 298: "xtn $dst, T8B, $dst, T8H\n\t# convert 4F to 4B vector" xtn $dst, T8B, $dst, T8H\n\t# convert 4F to 4B vector" => xtn $dst, T8B, $dst, T8H\t# convert 4F to 4B vector" src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2831: > 2829: f(0b000001, 15, 10), rf(Vn, 5), rf(Vd, 0); > 2830: } > 2831: What's the difference between dups and the dup right above? ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From chagedorn at openjdk.java.net Fri Sep 17 10:21:42 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 17 Sep 2021 10:21:42 GMT Subject: RFR: 8273933: [TESTBUG] Test must run without preallocated exceptions In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 09:21:40 GMT, Nils Eliasson wrote: > Executing vmTestbase/jit/t/t105/t105.java with the fix for (JDK-8273277) makes the test fail when run with the following arguments: > > -XX:+TieredCompilation > -XX:Tier0BackedgeNotifyFreqLog=0 > -XX:Tier2BackedgeNotifyFreqLog=0 > -XX:Tier3BackedgeNotifyFreqLog=0 > -XX:Tier2BackEdgeThreshold=1 > -XX:Tier3BackEdgeThreshold=1 > -XX:Tier4BackEdgeThreshold=1 > -Xbatch > > The problem is that the tests expects a detailed message from ArrayIndexOutOfBoundsException, but this test will trigger the optimization that reuses preallocated exceptions that have an empty detailed exceptions. > > It is wrong for the test to assume exceptions messages. > > Solution disable preallocated exceptions with the flag -XX:-ProfileTraps. That sounds reasonable. You should also update the copyright year. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5560 From chagedorn at openjdk.java.net Fri Sep 17 12:25:53 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 17 Sep 2021 12:25:53 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: <8Ce6bZtHwGEw8_wXZz4ak3obprd1YmZDi4cItcXB4bA=.a7162709-7aad-4709-a585-d2391392f49b@github.com> References: <8Ce6bZtHwGEw8_wXZz4ak3obprd1YmZDi4cItcXB4bA=.a7162709-7aad-4709-a585-d2391392f49b@github.com> Message-ID: On Wed, 1 Sep 2021 00:23:11 GMT, John Tortugo wrote: >> Hi, can I please get some reviews for this Pull Request? Here is a summary of the changes: >> >> - Add tests, using the new IR-based test framework, for several of the Ideal transformations on Add, Sub, Mul, Div, Loop nodes and some simple Scalar Replacement transformations. >> - Add more default IR regex's to IR-based test framework. >> - Changes to Sub, Div and Add Ideal nodes to that transformations on Int and Long types are the whenever possible same. >> - Changes to Sub*Node, Div*Node and Add*Node Ideal methods to fix some bugs and include new transformations. >> - New JTREG "ir_transformations" test group under test/hotspot/jtreg. > > John Tortugo has updated the pull request incrementally with 146 additional commits since the last revision: > > - Fix merge mistake. > - Merge branch 'jdk-8267265' of https://github.com/JohnTortugo/jdk into jdk-8267265 > - Addressing PR feedback: move tests to other directory, add custom tests, add tests for other optimizations, rename some tests. > - 8273197: ProblemList 2 jtools tests due to JDK-8273187 > 8273198: ProblemList java/lang/instrument/BootClassPath/BootClassPathTest.sh due to JDK-8273188 > > Reviewed-by: naoto > - 8262186: Call X509KeyManager.chooseClientAlias once for all key types > > Reviewed-by: xuelei > - 8273186: Remove leftover comment about sparse remembered set in G1 HeapRegionRemSet > > Reviewed-by: ayang > - 8273169: java/util/regex/NegativeArraySize.java failed after JDK-8271302 > > Reviewed-by: jiefu, serb > - 8273092: Sort classlist in JDK image > > Reviewed-by: redestad, ihse, dfuchs > - 8273144: Remove unused top level "Sample Collection Set Candidates" logging > > Reviewed-by: iwalulya, ayang > - 8262095: NPE in Flow$FlowAnalyzer.visitApply: Cannot invoke getThrownTypes because tree.meth.type is null > > Co-authored-by: Jan Lahoda > Co-authored-by: Vicente Romero > Reviewed-by: jlahoda > - ... and 136 more: https://git.openjdk.java.net/jdk/compare/ac430bf7...463102e2 Thanks for your effort to write tests for all these different kinds of transformations! Generally, they look good and are worth to have! You should add `@bug 8267265` to all files. test/hotspot/jtreg/compiler/c2/irTests/AddINodeIdealizationTests.java line 34: > 32: * @run driver compiler.c2.irTests.AddINodeIdealizationTests > 33: */ > 34: public class AddINodeIdealizationTests { General comments, also applies to the other test files: It might be good to sanity check the output results of all these transformations (even though they are simple). Since the tests only use simple randomized ints, you could use a single `@Run` method instead of one for each test. This could look something like [this](https://gist.github.com/chhagedorn/b16aba260a8fcf27c082beccf2cec0a3). test/hotspot/jtreg/compiler/c2/irTests/AddINodeIdealizationTests.java line 40: > 38: > 39: @Test > 40: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.SUB}) In this test and all the following ones (including the other files), I think you can remove unrelated `failOn` regexes on operations that are not part of the test. For example, in this test you can safely remove `IRNode.MUL, DIV, and SUB`. test/hotspot/jtreg/compiler/c2/irTests/AddINodeIdealizationTests.java line 91: > 89: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.SUB}) > 90: @IR(counts = {IRNode.ADD, "2"}) > 91: // Checks (x + c1) + y => (x + y) + c1 Unfortunately, a limitation of the framework to check the correct inputs of IR nodes. test/hotspot/jtreg/compiler/c2/irTests/AddLNodeIdealizationTests.java line 151: > 149: return (a - b) + (c - a); > 150: } > 151: Compared to the `AddI` tests, you've missed the case `(a - b) + (b - c) => (a - c)` here. test/hotspot/jtreg/compiler/c2/irTests/DivINodeIdealizationTests.java line 44: > 42: // Checks x / x => 1 > 43: public int constant(int x) { > 44: return x / x; This fails when `x` is zero with an `ArithmeticException`. I suggest to convert this into a custom run test and catch this case - maybe also testing zero as separate case to see if an exception is thrown with compiled code. test/hotspot/jtreg/compiler/c2/irTests/DivINodeIdealizationTests.java line 68: > 66: // Checks x / (y / y) => x > 67: public int identityThird(int x, int y) { > 68: return x / (y / y); Same problem as above with `y = 0`. test/hotspot/jtreg/compiler/c2/irTests/DivINodeIdealizationTests.java line 79: > 77: // Hotspot should keep the division because it may cause a division by zero trap > 78: public int retainDenominator(int x, int y) { > 79: return (x * y) / y; Same problem as above with `y = 0`. test/hotspot/jtreg/compiler/c2/irTests/DivLNodeIdealizationTests.java line 34: > 32: * @run driver compiler.c2.irTests.DivLNodeIdealizationTests > 33: */ > 34: public class DivLNodeIdealizationTests { Same div by zero problems as with `DivI`. Should be adjusted analogously. test/hotspot/jtreg/compiler/c2/irTests/MulINodeIdealizationTests.java line 45: > 43: //Checks Max(a,b) * min(a,b) => a*b > 44: public int excludeMaxMin(int x, int y){ > 45: return Math.max(x, y) * Math.min(x, y); `Math.min/max()` is intrinsified and HotSpot generates `CMove` nodes (see `LibraryCallKit::generate_min_max()`) for them. But it looks like `MulNode::Ideal` misses this check for `CMove` nodes. That could be done in a separate RFE (and then this test could be improved to check if the `CMove` node was removed). Anyways, min/max nodes are mainly used for loop limit computations, so it's harder to test this transformation in an easy way. test/hotspot/jtreg/compiler/c2/irTests/MulLNodeIdealizationTests.java line 43: > 41: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.DIV, IRNode.CALL}) > 42: @IR(counts = {IRNode.MUL, "1"}) > 43: //Checks Max(a,b) * min(a,b) => a*b See comments for `MulI`. test/hotspot/jtreg/compiler/c2/irTests/SubINodeIdealizationTests.java line 164: > 162: @Arguments(Argument.RANDOM_EACH) > 163: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.SUB, IRNode.ADD}) > 164: // Checks 0 - (a >> 31) => a >> 31 Comment should be adjusted to differentiate between signed and unsigned shifts. And a rule should be added to check that the `RShiftI` node was converted into an `URShiftI` node. test/hotspot/jtreg/compiler/c2/irTests/SubLNodeIdealizationTests.java line 155: > 153: @Arguments(Argument.RANDOM_EACH) > 154: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.SUB, IRNode.ADD}) > 155: // Checks 0 - (a >> 63) => a >>> 63 Same as for `SubI` above, a rule should be added for the shift nodes. test/hotspot/jtreg/compiler/c2/irTests/loopOpts/LoopIdealizationTests.java line 43: > 41: @Test > 42: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.ADD, IRNode.SUB, IRNode.LOOP, IRNode.COUNTEDLOOP, IRNode.COUNTEDLOOP_MAIN, IRNode.CALL}) > 43: //Checks that a for loop with 0 iterations is removed Missing space after `//` and also for other comments below. test/hotspot/jtreg/compiler/c2/irTests/loopOpts/LoopIdealizationTests.java line 44: > 42: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.ADD, IRNode.SUB, IRNode.LOOP, IRNode.COUNTEDLOOP, IRNode.COUNTEDLOOP_MAIN, IRNode.CALL}) > 43: //Checks that a for loop with 0 iterations is removed > 44: public void zeroIterForLoop(){ Missing space between `)` and `{` and also on other lines below. test/hotspot/jtreg/compiler/c2/irTests/loopOpts/LoopIdealizationTests.java line 52: > 50: @Test > 51: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.ADD, IRNode.SUB, IRNode.LOOP, IRNode.COUNTEDLOOP, IRNode.COUNTEDLOOP_MAIN, IRNode.CALL}) > 52: //Checks that a for loop with 0 iterations is removed Actually there is 1 iteration but we break it immediately (i.e. the loop is entered). test/hotspot/jtreg/compiler/c2/irTests/loopOpts/LoopIdealizationTests.java line 89: > 87: if (i == 0){ > 88: break; > 89: }else{ Spaces around `else`. test/hotspot/jtreg/compiler/c2/irTests/loopOpts/LoopIdealizationTests.java line 141: > 139: //Checks that a while loop with 1 iteration is simplified to straight code > 140: public void oneIterDoWhileLoop(){ > 141: do{ Spacing test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/ScalarReplacementTests.java line 29: > 27: /* > 28: * @test > 29: * @summary Tests that Escape Analisys and Scalar Replacement is able to handle some simple cases. Typo: Analisys -> Analysis test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/ScalarReplacementTests.java line 33: > 31: * @run driver compiler.c2.irTests.scalarReplacement.ScalarReplacementTests > 32: */ > 33: public class ScalarReplacementTests { You should also add some rules to check if there is an allocation or not. test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 162: > 160: public static final String SCOPE_OBJECT = "(.*# ScObj.*" + END; > 161: public static final String MEMBAR = START + "MemBar" + MID + END; > 162: I suggest to move all newly added regex together here. ------------- Changes requested by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5135 From github.com+6704669+asgibbons at openjdk.java.net Fri Sep 17 14:03:46 2021 From: github.com+6704669+asgibbons at openjdk.java.net (Scott Gibbons) Date: Fri, 17 Sep 2021 14:03:46 GMT Subject: RFR: 8273459: Update code segment alignment to 64 bytes In-Reply-To: <9_VCnv0v0ZrVLk3xKXDpsV-406yHK9iSEiagQVHmRhk=.f6375eb7-332c-4f82-b100-1f9db6b3d608@github.com> References: <9_VCnv0v0ZrVLk3xKXDpsV-406yHK9iSEiagQVHmRhk=.f6375eb7-332c-4f82-b100-1f9db6b3d608@github.com> Message-ID: On Thu, 16 Sep 2021 13:52:24 GMT, Scott Gibbons wrote: > Change the default code entry alignment to 64 bytes from 32 bytes. This allows for maintaining proper 64-byte alignment of data within a code segment, which is required by several AVX-512 instructions. > > I ran into this while implementing Base64 encoding and decoding. Code segments which were allocated with the address mod 32 == 0 but with the address mod 64 != 0 would cause the align() macro to misalign. This is because the align macro aligns to the size of the code segment and not the offset of the PC. So align(64) would align the PC to a multiple of 64 bytes from the start of the segment, and not to a pure 64-byte boundary as requested. Changing the alignment of the segment to 64 bytes fixes the issue. > > I have not seen any measurable difference in either performance or memory usage with the tests I have run. > > See [this ](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054180.html) article for the discussion thread. I think I have not made the point clearly enough. The `align` function is used to manipulate the address bits for the byte following the `align()`. This means that wherever the code is copied, the address of that byte should have the appropriate address bit configuration in the copy (as well as the original, of course). Since the current implementation is using the base address of the allocated segment to determine alignment, the only way to ensure the proper bit configuration of the address is to ensure the base address of the newly-allocated segment is aligned identically to the original. I believe this is entirely independent of `MaxVectorSize`, so I don't believe it's appropriate to use this value for address alignment. Using `pc()` fixes the case in the source segment, but will break 50% of the time when the segment is copied with a `CodeEntryAlignment` of 32. I think the bottom line is that `align()` is broken for any value greater than `CodeEntryAlignment`. I can foresee a case where it may be beneficial (from an algorithm perspective) to have large alignment values, like align(256) to simplify pointer arithmetic (for example). All of these proposed changes will not ensure this alignment when a segment is copied. Perhaps the appropriate thing to do is to put an `assert()` in `align()` to fail if the requested alignment cannot be ensured? IMHO, the "right" thing to do is to mark the bytes requiring address alignment and handle the cases on copy. This would add significant complexity, however. ------------- PR: https://git.openjdk.java.net/jdk/pull/5547 From neliasso at openjdk.java.net Fri Sep 17 14:20:07 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 17 Sep 2021 14:20:07 GMT Subject: RFR: 8273933: [TESTBUG] Test must run without preallocated exceptions [v2] In-Reply-To: References: Message-ID: > Executing vmTestbase/jit/t/t105/t105.java with the fix for (JDK-8273277) makes the test fail when run with the following arguments: > > -XX:+TieredCompilation > -XX:Tier0BackedgeNotifyFreqLog=0 > -XX:Tier2BackedgeNotifyFreqLog=0 > -XX:Tier3BackedgeNotifyFreqLog=0 > -XX:Tier2BackEdgeThreshold=1 > -XX:Tier3BackEdgeThreshold=1 > -XX:Tier4BackEdgeThreshold=1 > -Xbatch > > The problem is that the tests expects a detailed message from ArrayIndexOutOfBoundsException, but this test will trigger the optimization that reuses preallocated exceptions that have an empty detailed exceptions. > > It is wrong for the test to assume exceptions messages. > > Solution disable preallocated exceptions with the flag -XX:-ProfileTraps. Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: Update copyright year ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5560/files - new: https://git.openjdk.java.net/jdk/pull/5560/files/bbb58d5c..80187cfa Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5560&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5560&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5560.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5560/head:pull/5560 PR: https://git.openjdk.java.net/jdk/pull/5560 From jbhateja at openjdk.java.net Fri Sep 17 17:27:42 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 17 Sep 2021 17:27:42 GMT Subject: RFR: 8273459: Update code segment alignment to 64 bytes In-Reply-To: References: <9_VCnv0v0ZrVLk3xKXDpsV-406yHK9iSEiagQVHmRhk=.f6375eb7-332c-4f82-b100-1f9db6b3d608@github.com> Message-ID: On Fri, 17 Sep 2021 14:00:44 GMT, Scott Gibbons wrote: > I think I have not made the point clearly enough. The `align` function is used to manipulate the address bits for the byte following the `align()`. This means that wherever the code is copied, the address of that byte should have the appropriate address bit configuration in the copy (as well as the original, of course). Since the current implementation is using the base address of the allocated segment to determine alignment, the only way to ensure the proper bit configuration of the address is to ensure the base address of the newly-allocated segment is aligned identically to the original. > > I believe this is entirely independent of `MaxVectorSize`, so I don't believe it's appropriate to use this value for address alignment. Using `pc()` fixes the case in the source segment, but will break 50% of the time when the segment is copied with a `CodeEntryAlignment` of 32. > > I think the bottom line is that `align()` is broken for any value greater than `CodeEntryAlignment`. I can foresee a case where it may be beneficial (from an algorithm perspective) to have large alignment values, like align(256) to simplify pointer arithmetic (for example). All of these proposed changes will not ensure this alignment when a segment is copied. > > Perhaps the appropriate thing to do is to put an `assert()` in `align()` to fail if the requested alignment cannot be ensured? > > IMHO, the "right" thing to do is to mark the bytes requiring address alignment and handle the cases on copy. This would add significant complexity, however. Following code suggests that instr start addresses always honor CodeEntryAlignment. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.cpp#L114 Current value of CodeEntryAligment is 32, a 32 byte aligned address is inherently 8, 16 byte aligned but not vice-versa. Setting this value to 64 will cover cases for 8,16, 32 and 64 byte alignment constraints in stubGenerator_64.cpp. Also there are several location in stubGenerator.cpp using __ align(CodeEntryAlignment) and suddenly it will also ensure 64 byte alignment and create internal fragmentation issue, also as Vliadimir pointed out there are handful location which needs 64 byte alignment. My suggestion was, if you are attempting it then its scope should be extended only for AVX512 and hence MaxVectorSize usage was suggested since problem of alignment will majorly surface for vector instructions and MaxVectorSize value can be set to 32 even on AVX512 targets, thus its a robust indicator of vector size and associated alignment constraints. ------------- PR: https://git.openjdk.java.net/jdk/pull/5547 From dlong at openjdk.java.net Fri Sep 17 21:49:50 2021 From: dlong at openjdk.java.net (Dean Long) Date: Fri, 17 Sep 2021 21:49:50 GMT Subject: RFR: 8273459: Update code segment alignment to 64 bytes In-Reply-To: <9_VCnv0v0ZrVLk3xKXDpsV-406yHK9iSEiagQVHmRhk=.f6375eb7-332c-4f82-b100-1f9db6b3d608@github.com> References: <9_VCnv0v0ZrVLk3xKXDpsV-406yHK9iSEiagQVHmRhk=.f6375eb7-332c-4f82-b100-1f9db6b3d608@github.com> Message-ID: On Thu, 16 Sep 2021 13:52:24 GMT, Scott Gibbons wrote: > Change the default code entry alignment to 64 bytes from 32 bytes. This allows for maintaining proper 64-byte alignment of data within a code segment, which is required by several AVX-512 instructions. > > I ran into this while implementing Base64 encoding and decoding. Code segments which were allocated with the address mod 32 == 0 but with the address mod 64 != 0 would cause the align() macro to misalign. This is because the align macro aligns to the size of the code segment and not the offset of the PC. So align(64) would align the PC to a multiple of 64 bytes from the start of the segment, and not to a pure 64-byte boundary as requested. Changing the alignment of the segment to 64 bytes fixes the issue. > > I have not seen any measurable difference in either performance or memory usage with the tests I have run. > > See [this ](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054180.html) article for the discussion thread. It sounds like this will new alignment requirement will only be needed for stubs (initially?), but as proposed it will affect all other types of CodeBlobs. Just looking at the affect during startup, I saw padding for BufferBlobs go from 24 to 56, RuntimeBlobs go from 0 to 32 and 16 to 48, and nmethods go from 24 to 56. I would like to suggest again, to use the actual alignment requirements of the CodeBuffer to determine the alignment of the CodeBlob. > Perhaps the appropriate thing to do is to put an assert() in align() to fail if the requested alignment cannot be ensured? I agree. > IMHO, the "right" thing to do is to mark the bytes requiring address alignment and handle the cases on copy. This would add significant complexity, however. I disagree. Let's not mark individual bytes. The call to align() is enough to allow us to record the maximum alignment required by the CodeBuffer, and the added complexity is not at the individual instruction copy, but just choosing the correct alignment value when creating the CodeBlob. For example, use MAX2(codebuffer->required_alignment(), CodeEntryAlignment) in place of CodeEntryAlignment. And for my own curiousity, I would like to hear from Intel what the expected affect on icache performance is from increasing the alignment of code. ------------- PR: https://git.openjdk.java.net/jdk/pull/5547 From eliu at openjdk.java.net Sat Sep 18 03:50:50 2021 From: eliu at openjdk.java.net (Eric Liu) Date: Sat, 18 Sep 2021 03:50:50 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v5] In-Reply-To: References: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> Message-ID: On Fri, 17 Sep 2021 09:46:34 GMT, Ningsheng Jian wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix bugs > > src/hotspot/cpu/aarch64/aarch64.ad line 2445: > >> 2443: case Op_VectorCastB2X: >> 2444: case Op_VectorCastS2X: >> 2445: if (vlen < 4) { > > The vector_size_supported() check should already cover this and no need to check it here? I suggested to test this PR with the latest code. As some vector sizes which should not be supported for `VectorReinterpret` and `VectorCast*2X` have been fixed after https://github.com/openjdk/jdk/pull/5160. E.g. 128short => 64int. For the case above, as the species are different, it 1. reinterprets 8 short to 2 short 2. casts 2 short to 2 int Since we don't support short type with element size less than 4, this situation should be detected as unsupported when trying to generate `VectorReinterpret` node with 2 short, which in current branch is mistaken for 4 short. I think those workaround code for jdk17(https://github.com/openjdk/jdk17/compare/master...theRealELiu:JDK-8268966) could be removed entirely after this work. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From qingfeng.yy at alibaba-inc.com Sat Sep 18 07:34:26 2021 From: qingfeng.yy at alibaba-inc.com (Yi Yang) Date: Sat, 18 Sep 2021 15:34:26 +0800 Subject: =?UTF-8?B?UXVlc3Rpb24gYWJvdXQgSklUIFBlZXBob2xlIG9wdGltaXphdGlvbnM=?= Message-ID: Hello Community, I see that both C1 and C2 introduced peephole optimization, which is a classic compiler optimization phase. However, it seems that they are barely implemented/used/changed from the first time they open-sourced to now on. C1's peephole(LIR_Assembler::peephole) does nothing, and its implementation on most platforms is empty. As for C2's peephole, I noticed currently arm/aarch64 has no peephole rules, x686/s390/ppc has 2-3 peephole rules. PhasePeephole on almost all platforms is disabled by default, the only exception is x86, which is enabled by default. I want to know why we do not add more rules to allow merging more instructions by using peephole(Like llvm/lib/CodeGen/PeepholeOptimizer.cpp). And I noticed that many rules have been commented out. Is there any reason for that? Is it because XXNode::Ideal does most of the work? Or has profiling proved that peepholes are not profitable/balanced between compilation time and their outcome? Or it's difficult to do peepholes by rule-based approach? Are we worthy of continuing to work on it? I know nothing about the prehistoric era of HotSpot JITs, any input is appreciated! Thanks. From aph-open at littlepinkcloud.com Sat Sep 18 08:53:19 2021 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Sat, 18 Sep 2021 09:53:19 +0100 Subject: Question about JIT Peephole optimizations In-Reply-To: References: Message-ID: <508f19ad-aa90-3749-d896-fdef59b5d795@littlepinkcloud.com> On 9/18/21 8:34 AM, Yi Yang wrote: > I want to know why we do not add more rules to allow merging more instructions by using peephole(Like llvm/lib/CodeGen/PeepholeOptimizer.cpp). And I noticed that many rules have been commented out. Is there any reason for that? Is it because XXNode::Ideal does most of the work? Or has profiling proved that peepholes are not profitable/balanced between compilation time and their outcome? Or it's difficult to do peepholes by rule-based approach? Are we worthy of continuing to work on it? It's not very likely that you'll be able to find a peephole rule that isn't handled better in other ways. I don't think I've ever found one. The closest we ever got was the LDP optimization, and that's much happier in the assembler. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From zgu at openjdk.java.net Sat Sep 18 23:14:03 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Sat, 18 Sep 2021 23:14:03 GMT Subject: RFR: 8273454: C2: Transform (-a)*(-b) into a*b [v7] In-Reply-To: References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: On Fri, 17 Sep 2021 07:42:59 GMT, Tobias Hartmann wrote: >> Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: >> >> @TobiHartmann's comments > > Thanks for making these changes, looks good to me. @TobiHartmann @chhagedorn Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From zgu at openjdk.java.net Sat Sep 18 23:14:06 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Sat, 18 Sep 2021 23:14:06 GMT Subject: Integrated: 8273454: C2: Transform (-a)*(-b) into a*b In-Reply-To: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> References: <5dO4eT3uMoBQR6GvjBoWt4KhDZLeQTbUgTOHjsVxrpM=.cb2d4bd8-08cb-48d2-a64c-9be33964391d@github.com> Message-ID: <_BUKLmrnSIU13-VPm6rSDExPyj8uKds9ABpaCf1_3eg=.fe6c2826-d823-448d-80bc-0cd6e4816918@github.com> On Tue, 7 Sep 2021 22:40:50 GMT, Zhengyu Gu wrote: > The transformation reduce instructions in generated code. > > ### x86_64: > > Before: > ``` > 0x00007fb92c78b3ac: neg %esi > 0x00007fb92c78b3ae: neg %edx > 0x00007fb92c78b3b0: mov %esi,%eax > 0x00007fb92c78b3b2: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > After: > > ; - TestSub::runSub at -1 (line 9) > 0x00007fc8c05b74ac: mov %esi,%eax > 0x00007fc8c05b74ae: imul %edx,%eax ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > > > ### AArch64: > Before: > > 0x0000ffff814b4a70: neg w11, w1 > 0x0000ffff814b4a74: mneg w0, w2, w11 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) > > > After: > > 0x0000ffff794a67f0: mul w0, w1, w2 ;*imul {reexecute=0 rethrow=0 return_oop=0} > ; - TestSub::runSub at 4 (line 9) This pull request has now been integrated. Changeset: 7c9868c0 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/7c9868c0b3c9bd3d305e71f91596190813cdccce Stats: 120 lines in 2 files changed: 119 ins; 0 del; 1 mod 8273454: C2: Transform (-a)*(-b) into a*b Reviewed-by: thartmann, eliu, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/5403 From thartmann at openjdk.java.net Mon Sep 20 06:11:53 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 20 Sep 2021 06:11:53 GMT Subject: RFR: 8273933: [TESTBUG] Test must run without preallocated exceptions [v2] In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 14:20:07 GMT, Nils Eliasson wrote: >> Executing vmTestbase/jit/t/t105/t105.java with the fix for (JDK-8273277) makes the test fail when run with the following arguments: >> >> -XX:+TieredCompilation >> -XX:Tier0BackedgeNotifyFreqLog=0 >> -XX:Tier2BackedgeNotifyFreqLog=0 >> -XX:Tier3BackedgeNotifyFreqLog=0 >> -XX:Tier2BackEdgeThreshold=1 >> -XX:Tier3BackEdgeThreshold=1 >> -XX:Tier4BackEdgeThreshold=1 >> -Xbatch >> >> The problem is that the tests expects a detailed message from ArrayIndexOutOfBoundsException, but this test will trigger the optimization that reuses preallocated exceptions that have an empty detailed exceptions. >> >> It is wrong for the test to assume exceptions messages. >> >> Solution disable preallocated exceptions with the flag -XX:-ProfileTraps. > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright year Just wondering, why does `-XX:-OmitStackTraceInFastThrow` not help? ------------- PR: https://git.openjdk.java.net/jdk/pull/5560 From chagedorn at openjdk.java.net Mon Sep 20 06:44:54 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 20 Sep 2021 06:44:54 GMT Subject: RFR: 8273965: some testlibrary_tests/ir_framework tests fail when c1 disabled In-Reply-To: References: Message-ID: <55dp9r7-ES76GC2UP3kWy2M6Ffxc7xK6uRP-0AtIuaU=.3ace3772-2d1b-4aab-bca0-9838f8ce4655@github.com> On Sat, 18 Sep 2021 15:52:11 GMT, Ao Qi wrote: > These tests failed with c2-only build: > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestCompLevels.java > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestControls.java > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestRunTests.java Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5576 From rrich at openjdk.java.net Mon Sep 20 08:17:57 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Mon, 20 Sep 2021 08:17:57 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 15:25:46 GMT, Volker Simonis wrote: > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > For the attached JTreg test, we get the following exception in interpreter mode: > > java.lang.NullPointerException: Cannot read the array length because "" is null > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > java.lang.NullPointerException > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > ## Implementation details > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. Hi Volker, R?mis, I haven't yet looked into the details of the change but @TheRealMDoerr kindly explained it to me. As I understood, you are using global JNI references to hold the preallocated exceptions with partial backtrace. Backtraces seem to hold references to the java mirrors of the holders of the methods in the backtrace [1]. This will keep their classloaders alive and prevent classunloading. Also the owning nmethod cannot be unloaded for the same reason. It shouldn't be to difficult to write a test that leaks classes because of this. > > _Mailing list message from [Remi Forax](mailto:forax at univ-mlv.fr) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.java.net):_ > > (not a reviewer so this message will not be really helpful ...) > > Hi Volker, > for me it's not an enhancement, but a bug fix, in production an exception with no stacktrace is useless and result in hours lost trying to figure out the issue I'd agree. Even in development exceptions should have a stacktrace and therefore, IMHO, OmitStackTraceInFastThrow should be off by default. In my eyes exceptions are means to handle unforseen application states in an best effort approach. Often they will be caused by bugs and the attached stacktrace is valuable information to find them. Your enhancement limits the stacktrace to potentially just the top frame which in many cases will not be enough and also confusing to developers. Also I don't think that we should optimize applications that have run into a bug. In the related [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563) you gave the example of [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274) method: public static boolean isAlpha(int c) { try { return IS_ALPHA[c]; } catch (ArrayIndexOutOfBoundsException ex) { return false; } } There the backtrace is completely useless and can be omitted but IMHO (stated above) this is a misuse of exceptions in the first place and should be fixed. Such idioms may occur in libraries that are out of maintenance (which should not be used for security reasons) or the maintainer is not willing to accept the fix. Therefore we could limit OmitStackTraceInFastThrow to these idioms only where the thrown exception is caught in a local or inlined handler that ignores it. Maybe this is not even too difficult. If C2 compiles `isAlpha` with OmitStackTraceInFastThrow enabled then practically everything related to exception handling gets eliminated. It might be possible to recognize that the IR-Node representing the preallocated exception has no uses and only if it actually does have uses it could be replaced with an uncommon trap (which is likely the harder part). Cheers, Richard. [1] Exception backtrace references java mirrors: https://github.com/openjdk/jdk/blob/7c9868c0b3c9bd3d305e71f91596190813cdccce/src/hotspot/share/classfile/javaClasses.cpp#L2178-L2182 ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From rrich at openjdk.java.net Mon Sep 20 08:22:53 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Mon, 20 Sep 2021 08:22:53 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 15:25:46 GMT, Volker Simonis wrote: > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > For the attached JTreg test, we get the following exception in interpreter mode: > > java.lang.NullPointerException: Cannot read the array length because "" is null > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > java.lang.NullPointerException > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > ## Implementation details > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. src/hotspot/share/ci/ciEnv.cpp line 410: > 408: // nmethods are no strong roots so we have to create a global JNI handle > 409: // for the created exception in order to keep it alive accross GCs. > 410: objh = JNIHandles::make_global(handle); The backtrace references the java mirrors corresponding to the methods in the backtrace[1]. Thereby the global JNI handle will keep their classloaders alive and prevent classunloading and also unloading of the nmethod being compiled. [1] https://github.com/openjdk/jdk/blob/7c9868c0b3c9bd3d305e71f91596190813cdccce/src/hotspot/share/classfile/javaClasses.cpp#L2178-L2182 ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From volker.simonis at gmail.com Mon Sep 20 09:03:12 2021 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 20 Sep 2021 11:03:12 +0200 Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: Hi Richard, thanks a lot for looking into this change. Nmethod unloading does still work with this change, just take a look at the associated JTreg test which compiles and then unloads a method with a generated implicit exception. Once the nmethod has been unloaded, the global JNI handle will be released and the class can be unloaded as well. But I agree that it might be too late and class unloading shouldn't depend on unloading of all nmethods which reference that class. I'll have a look if I can fix that somehow. Best regards, Volker On Mon, Sep 20, 2021 at 10:53 AM Richard Reingruber wrote: > > On Tue, 7 Sep 2021 15:25:46 GMT, Volker Simonis wrote: > > > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > > > For the attached JTreg test, we get the following exception in interpreter mode: > > > > java.lang.NullPointerException: Cannot read the array length because "" is null > > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > > > java.lang.NullPointerException > > > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > > > java.lang.NullPointerException > > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > > > java.lang.NullPointerException > > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > > > ## Implementation details > > > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > src/hotspot/share/ci/ciEnv.cpp line 410: > > > 408: // nmethods are no strong roots so we have to create a global JNI handle > > 409: // for the created exception in order to keep it alive accross GCs. > > 410: objh = JNIHandles::make_global(handle); > > The backtrace references the java mirrors corresponding to the methods in the backtrace[1]. Thereby the global JNI handle will keep their classloaders alive and prevent classunloading and also unloading of the nmethod being compiled. > > [1] https://github.com/openjdk/jdk/blob/7c9868c0b3c9bd3d305e71f91596190813cdccce/src/hotspot/share/classfile/javaClasses.cpp#L2178-L2182 > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5392 From rrich at openjdk.java.net Mon Sep 20 09:51:58 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Mon, 20 Sep 2021 09:51:58 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 15:25:46 GMT, Volker Simonis wrote: > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > For the attached JTreg test, we get the following exception in interpreter mode: > > java.lang.NullPointerException: Cannot read the array length because "" is null > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > java.lang.NullPointerException > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > ## Implementation details > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. Hi Volker, > > _Mailing list message from [Volker Simonis](mailto:volker.simonis at gmail.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.java.net):_ > > Hi Richard, > > thanks a lot for looking into this change. > > Nmethod unloading does still work with this change, just take a look > at the associated JTreg test which compiles and then unloads a method > with a generated implicit exception. Yes it works in your test because you explicitly make the compiled method not entrant. Think of another test where a nmethod would be unloaded because the corresponding classloader isn't reachable anymore. The change prevents this because the loader will be kept alive by the preallocated exception if one exists. A test with a class leak would repeatedly create a loader, c2 compile a method with preallocated exception that was loaded by the loader and then drop the reference to the classloader. All the loaders would be kept alive by the preallocated exceptions. > Once the nmethod has been > unloaded, the global JNI handle will be released and the class can be > unloaded as well. But I agree that it might be too late and class > unloading shouldn't depend on unloading of all nmethods which > reference that class. I'll have a look if I can fix that somehow. > > Best regards, > Volker > > On Mon, Sep 20, 2021 at 10:53 AM Richard Reingruber > wrote: Cheers, Richard. ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From adinn at openjdk.java.net Mon Sep 20 09:55:55 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Mon, 20 Sep 2021 09:55:55 GMT Subject: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6] In-Reply-To: References: <_XL6WybKwHeJ54kSQnN_q0_NgvR7ib9BFjNJ4HrkO_g=.f82e6cda-b31f-4eee-9185-3e52ebd6b54d@github.com> Message-ID: <2lOynSpsbUNNtPnEX_vLx2O2n9M1tO3RdBGoKEiOiO0=.fa1caf71-5e7a-4325-b6bd-6745eb0ba668@github.com> On Fri, 17 Sep 2021 07:13:24 GMT, Wu Yan wrote: >> It's fine. I don't think it'll affect any real programs, so it's rather pointless. I don't know if that's any reason not to approve it. > > Andrew, can you help us to approve this? I agree with Andrew Haley that this patch is not going to make an improvement for anything but a very small number of applications. Processing of strings over a few 10s of bytes is rare. On the other hand the doesn't seem to cause any performance drop for the much more common case of processing short strings. so it does no harm. Also, the new and old code are much the same in terms of complexity so that is no reason to prefer one over the other. The only real concern I have is that any change involves the risk of error and the ratio of cases that might benefit to cases that might suffer from an error is very low. I don't think that's a reason to avoid pushing this patch upstream but it does suggest that we should not backport it. ------------- PR: https://git.openjdk.java.net/jdk/pull/4722 From aph at openjdk.java.net Mon Sep 20 10:13:10 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 20 Sep 2021 10:13:10 GMT Subject: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v7] In-Reply-To: References: Message-ID: On Mon, 30 Aug 2021 06:26:01 GMT, Wang Huang wrote: >> Dear all, >> Can you do me a favor to review this patch. This patch use `ldp` to implement String.compareTo. >> >> * We add a JMH test case >> * Here is the result of this test case >> >> Benchmark |(size)| Mode| Cnt|Score | Error |Units >> ---------------------------------|------|-----|----|------|--------|----- >> StringCompare.compareLL | 64 | avgt| 5 |7.992 | ? 0.005|us/op >> StringCompare.compareLL | 72 | avgt| 5 |15.029| ? 0.006|us/op >> StringCompare.compareLL | 80 | avgt| 5 |14.655| ? 0.011|us/op >> StringCompare.compareLL | 91 | avgt| 5 |16.363| ? 0.12 |us/op >> StringCompare.compareLL | 101 | avgt| 5 |16.966| ? 0.007|us/op >> StringCompare.compareLL | 121 | avgt| 5 |19.276| ? 0.006|us/op >> StringCompare.compareLL | 181 | avgt| 5 |19.002| ? 0.417|us/op >> StringCompare.compareLL | 256 | avgt| 5 |24.707| ? 0.041|us/op >> StringCompare.compareLLWithLdp| 64 | avgt| 5 |8.001 | ? 0.121|us/op >> StringCompare.compareLLWithLdp| 72 | avgt| 5 |11.573| ? 0.003|us/op >> StringCompare.compareLLWithLdp| 80 | avgt| 5 |6.861 | ? 0.004|us/op >> StringCompare.compareLLWithLdp| 91 | avgt| 5 |12.774| ? 0.201|us/op >> StringCompare.compareLLWithLdp| 101 | avgt| 5 |8.691 | ? 0.004|us/op >> StringCompare.compareLLWithLdp| 121 | avgt| 5 |11.091| ? 1.342|us/op >> StringCompare.compareLLWithLdp| 181 | avgt| 5 |14.64 | ? 0.581|us/op >> StringCompare.compareLLWithLdp| 256 | avgt| 5 |25.879| ? 1.775|us/op >> StringCompare.compareUU | 64 | avgt| 5 |13.476| ? 0.01 |us/op >> StringCompare.compareUU | 72 | avgt| 5 |15.078| ? 0.006|us/op >> StringCompare.compareUU | 80 | avgt| 5 |23.512| ? 0.011|us/op >> StringCompare.compareUU | 91 | avgt| 5 |24.284| ? 0.008|us/op >> StringCompare.compareUU | 101 | avgt| 5 |20.707| ? 0.017|us/op >> StringCompare.compareUU | 121 | avgt| 5 |29.302| ? 0.011|us/op >> StringCompare.compareUU | 181 | avgt| 5 |39.31 | ? 0.016|us/op >> StringCompare.compareUU | 256 | avgt| 5 |54.592| ? 0.392|us/op >> StringCompare.compareUUWithLdp| 64 | avgt| 5 |16.389| ? 0.008|us/op >> StringCompare.compareUUWithLdp| 72 | avgt| 5 |10.71 | ? 0.158|us/op >> StringCompare.compareUUWithLdp| 80 | avgt| 5 |11.488| ? 0.024|us/op >> StringCompare.compareUUWithLdp| 91 | avgt| 5 |13.412| ? 0.006|us/op >> StringCompare.compareUUWithLdp| 101 | avgt| 5 |16.245| ? 0.434|us/op >> StringCompare.compareUUWithLdp| 121 | avgt| 5 |16.597| ? 0.016|us/op >> StringCompare.compareUUWithLdp| 181 | avgt| 5 |27.373| ? 0.017|us/op >> StringCompare.compareUUWithLdp| 256 | avgt| 5 |41.74 | ? 3.5 |us/op >> >> From this table, we can see that in most cases, our patch is better than old one. >> >> Thank you for your review. Any suggestions are welcome. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix windows build failed Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4722 From aph at openjdk.java.net Mon Sep 20 10:13:10 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 20 Sep 2021 10:13:10 GMT Subject: RFR: 8268231: Aarch64: Use ldp in intrinsics for String.compareTo [v6] In-Reply-To: <2lOynSpsbUNNtPnEX_vLx2O2n9M1tO3RdBGoKEiOiO0=.fa1caf71-5e7a-4325-b6bd-6745eb0ba668@github.com> References: <_XL6WybKwHeJ54kSQnN_q0_NgvR7ib9BFjNJ4HrkO_g=.f82e6cda-b31f-4eee-9185-3e52ebd6b54d@github.com> <2lOynSpsbUNNtPnEX_vLx2O2n9M1tO3RdBGoKEiOiO0=.fa1caf71-5e7a-4325-b6bd-6745eb0ba668@github.com> Message-ID: On Mon, 20 Sep 2021 09:52:45 GMT, Andrew Dinn wrote: >> Andrew, can you help us to approve this? > > I agree with Andrew Haley that this patch is not going to make an improvement for anything but a very small number of applications. Processing of strings over a few 10s of bytes is rare. On the other hand the doesn't seem to cause any performance drop for the much more common case of processing short strings. so it does no harm. Also, the new and old code are much the same in terms of complexity so that is no reason to prefer one over the other. The only real concern I have is that any change involves the risk of error and the ratio of cases that might benefit to cases that might suffer from an error is very low. I don't think that's a reason to avoid pushing this patch upstream but it does suggest that we should not backport it. OK, thanks. That seems like a sensible compromise. ------------- PR: https://git.openjdk.java.net/jdk/pull/4722 From ihse at openjdk.java.net Mon Sep 20 11:43:47 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 20 Sep 2021 11:43:47 GMT Subject: RFR: 8273927: Enable hsdis for riscv64 In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 08:38:09 GMT, Aleksey Shipilev wrote: > Currently compiled `hsdis-riscv64.so` binary complains: > > > hsdis: bad native mach=architecture not set in Makefile!; please port hsdis to this platform > > > It seems to be as simple as point to the right BFD arch. > > Additional testing: > - [x] Linux RISC-V port, `-XX:+PrintAssembly` works Marked as reviewed by ihse (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5558 From simonis at openjdk.java.net Mon Sep 20 12:23:07 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 20 Sep 2021 12:23:07 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: <4aME71kyj1wnVLbosGZMtSpFNHTIOYPR_uIlYLoi5RM=.108a01cf-7072-40f6-b304-242414ea1f7c@github.com> > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > For the attached JTreg test, we get the following exception in interpreter mode: > > java.lang.NullPointerException: Cannot read the array length because "" is null > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > java.lang.NullPointerException > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > ## Implementation details > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Move the '_implicit_exceptions' GrowableArray into the compiler arena and correctly initialize '_implicit_excepts_offset' for native wrappers - 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5392/files - new: https://git.openjdk.java.net/jdk/pull/5392/files/906fe7f2..07ebd638 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5392&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5392&range=00-01 Stats: 14444 lines in 647 files changed: 8880 ins; 3313 del; 2251 mod Patch: https://git.openjdk.java.net/jdk/pull/5392.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5392/head:pull/5392 PR: https://git.openjdk.java.net/jdk/pull/5392 From chagedorn at openjdk.java.net Mon Sep 20 12:56:51 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 20 Sep 2021 12:56:51 GMT Subject: Integrated: 8273895: compiler/ciReplay/TestVMNoCompLevel.java fails due to wrong data size with TieredStopAtLevel=2, 3 In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 14:43:43 GMT, Christian Hagedorn wrote: > `TestVMNoCompLevel.java` is first letting the VM crash with `-XX:CICrashAt=1` (method `TestMain::test()`), then removes the compilation level information from the compile entry in the replay file and then replay compiles with and without tiered compilation. When running the replay file with tiered compilation off, it results in the assertion failure. > > When letting the VM crash with `-XX:TieredStopAtLevel=2,3` (C1 only), we get a slightly different MDO size for `TestMain::test()` compared to letting the VM crash with `-XX:-TieredCompilation` (C2 only). The C1 reported MDO for `TestMain::test()` is slightly smaller than the C2 MDO. The reason for that can be traced back to JDK-8251462 which changed [this code](https://github.com/openjdk/jdk/commit/15196325#diff-74ba139c0d6ec44945f1fc6d18d63e0d0fe0da5d38a5e347c6d4d38e0f7b112bL788-L790) in `is_speculative_trap_bytecode()`. This now only returns true if C2 is enabled. `is_speculative_trap_bytecode()` is called when initializing an MDO here: > https://github.com/openjdk/jdk/blob/2ef6871118109b294e3148c8f15d4335039dd121/src/hotspot/share/oops/methodData.cpp#L1235 > > If C2 is enabled, then we reserve some extra data space for trap data. But when running with C1 only, this is no longer done since JDK-8251462, so we allocate no extra data space in the MDO for the crashing method `TestMain::test()`. > > When now crashing first with `-XX:TieredStopAtLevel=2,3`, we collect an MDO without the extra trap data for the replay file. But when replay compiling afterwards with `-XX:-TieredCompilation`, we try to compile it with C2 (we removed the compilation level from the compile entry). We initialize the MDO of `TestMain::test()` with the extra trap data which is a mismatch to the reported C1 only MDO without extra trap data and we fail with the assertion. > > I suggest to just not run this test with `-XX:TieredStopAtLevel=2,3` to not try to compile a C1 method with profiling data with C2 in order to avoid this MDO mismatch assertion failure. I'm not sure either of the value of this test as this old format is not created anymore. But we might still want to keep this test around. > > Thanks, > Christian This pull request has now been integrated. Changeset: a561eac9 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/a561eac912740da6a5982c47558e13f34481219f Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8273895: compiler/ciReplay/TestVMNoCompLevel.java fails due to wrong data size with TieredStopAtLevel=2,3 Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.java.net/jdk/pull/5548 From chagedorn at openjdk.java.net Mon Sep 20 13:00:03 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 20 Sep 2021 13:00:03 GMT Subject: Integrated: 8273825: TestIRMatching.java fails after JDK-8266550 In-Reply-To: References: Message-ID: <-D8PB08Sah82AIKe8t5l7ZDsnGM6pbyVJNH0vJp3FYg=.63c07fba-e6c1-459e-b263-efc15d30fe38@github.com> On Fri, 17 Sep 2021 06:24:54 GMT, Christian Hagedorn wrote: > [JDK-8266550](https://bugs.openjdk.java.net/browse/JDK-8266550) changed the class hierarchy of types and thus had to adapt some default IR regexes. In this process, an additional "klass" + ";" was missed to remove in `CHECKCAST_ARRAY_OF` for the part matching platforms such as aarch64 or PPC. > > Thanks, > Christian This pull request has now been integrated. Changeset: 6f3e40c1 Author: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/6f3e40c16db899f1add67b997dede17eeb83418f Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8273825: TestIRMatching.java fails after JDK-8266550 Reviewed-by: thartmann, roland ------------- PR: https://git.openjdk.java.net/jdk/pull/5555 From nils.eliasson at oracle.com Mon Sep 20 14:06:28 2021 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Mon, 20 Sep 2021 16:06:28 +0200 Subject: RFR: 8273933: [TESTBUG] Test must run without preallocated exceptions [v2] In-Reply-To: References: Message-ID: On 2021-09-20 08:11, Tobias Hartmann wrote: > On Fri, 17 Sep 2021 14:20:07 GMT, Nils Eliasson wrote: > >>> Executing vmTestbase/jit/t/t105/t105.java with the fix for (JDK-8273277) makes the test fail when run with the following arguments: >>> >>> -XX:+TieredCompilation >>> -XX:Tier0BackedgeNotifyFreqLog=0 >>> -XX:Tier2BackedgeNotifyFreqLog=0 >>> -XX:Tier3BackedgeNotifyFreqLog=0 >>> -XX:Tier2BackEdgeThreshold=1 >>> -XX:Tier3BackEdgeThreshold=1 >>> -XX:Tier4BackEdgeThreshold=1 >>> -Xbatch >>> >>> The problem is that the tests expects a detailed message from ArrayIndexOutOfBoundsException, but this test will trigger the optimization that reuses preallocated exceptions that have an empty detailed exceptions. >>> >>> It is wrong for the test to assume exceptions messages. >>> >>> Solution disable preallocated exceptions with the flag -XX:-ProfileTraps. >> Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Update copyright year > Just wondering, why does `-XX:-OmitStackTraceInFastThrow` not help? That flag should work too - both are needed: ? if (treat_throw_as_hot ????? && (!StackTraceInThrowable || OmitStackTraceInFastThrow)) { Where treat_throw_as_hot is dependent on the ProfileTraps flag and the trap count. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5560 From neliasso at openjdk.java.net Mon Sep 20 14:14:01 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 20 Sep 2021 14:14:01 GMT Subject: Integrated: 8273934: Remove unused perfcounters In-Reply-To: References: Message-ID: On Wed, 11 Aug 2021 20:48:46 GMT, Nils Eliasson wrote: > This patch removes two unused PerfCounters. > > Please review, > Regards, > Nils Eliasson This pull request has now been integrated. Changeset: 9aa12daa Author: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/9aa12daa15944910c7b29d1715e8aee66cca5315 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8273934: Remove unused perfcounters Reviewed-by: chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/5093 From neliasso at openjdk.java.net Mon Sep 20 14:26:40 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 20 Sep 2021 14:26:40 GMT Subject: RFR: 8273933: [TESTBUG] Test must run without preallocated exceptions [v3] In-Reply-To: References: Message-ID: <-1bHPTiy3h2jY2UStzNhJV1M7DitZvzCpL5wD7JdzEM=.1001774e-e023-45cc-ac38-8fc4f5f8abb8@github.com> > Executing vmTestbase/jit/t/t105/t105.java with the fix for (JDK-8273277) makes the test fail when run with the following arguments: > > -XX:+TieredCompilation > -XX:Tier0BackedgeNotifyFreqLog=0 > -XX:Tier2BackedgeNotifyFreqLog=0 > -XX:Tier3BackedgeNotifyFreqLog=0 > -XX:Tier2BackEdgeThreshold=1 > -XX:Tier3BackEdgeThreshold=1 > -XX:Tier4BackEdgeThreshold=1 > -Xbatch > > The problem is that the tests expects a detailed message from ArrayIndexOutOfBoundsException, but this test will trigger the optimization that reuses preallocated exceptions that have an empty detailed exceptions. > > It is wrong for the test to assume exceptions messages. > > Solution disable preallocated exceptions with the flag -XX:-ProfileTraps. Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: Update t105.java ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5560/files - new: https://git.openjdk.java.net/jdk/pull/5560/files/80187cfa..9c70227b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5560&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5560&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5560.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5560/head:pull/5560 PR: https://git.openjdk.java.net/jdk/pull/5560 From thartmann at openjdk.java.net Mon Sep 20 14:49:25 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 20 Sep 2021 14:49:25 GMT Subject: RFR: 8273933: [TESTBUG] Test must run without preallocated exceptions [v3] In-Reply-To: <-1bHPTiy3h2jY2UStzNhJV1M7DitZvzCpL5wD7JdzEM=.1001774e-e023-45cc-ac38-8fc4f5f8abb8@github.com> References: <-1bHPTiy3h2jY2UStzNhJV1M7DitZvzCpL5wD7JdzEM=.1001774e-e023-45cc-ac38-8fc4f5f8abb8@github.com> Message-ID: On Mon, 20 Sep 2021 14:26:40 GMT, Nils Eliasson wrote: >> Executing vmTestbase/jit/t/t105/t105.java with the fix for (JDK-8273277) makes the test fail when run with the following arguments: >> >> -XX:+TieredCompilation >> -XX:Tier0BackedgeNotifyFreqLog=0 >> -XX:Tier2BackedgeNotifyFreqLog=0 >> -XX:Tier3BackedgeNotifyFreqLog=0 >> -XX:Tier2BackEdgeThreshold=1 >> -XX:Tier3BackEdgeThreshold=1 >> -XX:Tier4BackEdgeThreshold=1 >> -Xbatch >> >> The problem is that the tests expects a detailed message from ArrayIndexOutOfBoundsException, but this test will trigger the optimization that reuses preallocated exceptions that have an empty detailed exceptions. >> >> It is wrong for the test to assume exceptions messages. >> >> Solution disable preallocated exceptions with the flag -XX:-ProfileTraps. > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Update t105.java Looks good to me! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5560 From neliasso at openjdk.java.net Mon Sep 20 14:50:21 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 20 Sep 2021 14:50:21 GMT Subject: RFR: 8273933: [TESTBUG] Test must run without preallocated exceptions [v3] In-Reply-To: References: <-1bHPTiy3h2jY2UStzNhJV1M7DitZvzCpL5wD7JdzEM=.1001774e-e023-45cc-ac38-8fc4f5f8abb8@github.com> Message-ID: On Mon, 20 Sep 2021 14:43:37 GMT, Tobias Hartmann wrote: >> Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Update t105.java > > Looks good to me! I changed the patch to disable OmitStackTraceInFastThrow instead of ProfileTraps, after a suggestion from @TobiHartmann. Disabling ProfileTraps has more side effects - so just disabling OmitStackTraceInFastThrow should keeps the test environment closer to default. ------------- PR: https://git.openjdk.java.net/jdk/pull/5560 From chagedorn at openjdk.java.net Mon Sep 20 14:56:57 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 20 Sep 2021 14:56:57 GMT Subject: RFR: 8273933: [TESTBUG] Test must run without preallocated exceptions [v3] In-Reply-To: <-1bHPTiy3h2jY2UStzNhJV1M7DitZvzCpL5wD7JdzEM=.1001774e-e023-45cc-ac38-8fc4f5f8abb8@github.com> References: <-1bHPTiy3h2jY2UStzNhJV1M7DitZvzCpL5wD7JdzEM=.1001774e-e023-45cc-ac38-8fc4f5f8abb8@github.com> Message-ID: On Mon, 20 Sep 2021 14:26:40 GMT, Nils Eliasson wrote: >> Executing vmTestbase/jit/t/t105/t105.java with the fix for (JDK-8273277) makes the test fail when run with the following arguments: >> >> -XX:+TieredCompilation >> -XX:Tier0BackedgeNotifyFreqLog=0 >> -XX:Tier2BackedgeNotifyFreqLog=0 >> -XX:Tier3BackedgeNotifyFreqLog=0 >> -XX:Tier2BackEdgeThreshold=1 >> -XX:Tier3BackEdgeThreshold=1 >> -XX:Tier4BackEdgeThreshold=1 >> -Xbatch >> >> The problem is that the tests expects a detailed message from ArrayIndexOutOfBoundsException, but this test will trigger the optimization that reuses preallocated exceptions that have an empty detailed exceptions. >> >> It is wrong for the test to assume exceptions messages. >> >> Solution disable preallocated exceptions with the flag -XX:-ProfileTraps. > > Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: > > Update t105.java Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5560 From neliasso at openjdk.java.net Mon Sep 20 15:02:53 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 20 Sep 2021 15:02:53 GMT Subject: RFR: 8273933: [TESTBUG] Test must run without preallocated exceptions [v3] In-Reply-To: References: <-1bHPTiy3h2jY2UStzNhJV1M7DitZvzCpL5wD7JdzEM=.1001774e-e023-45cc-ac38-8fc4f5f8abb8@github.com> Message-ID: On Mon, 20 Sep 2021 14:43:37 GMT, Tobias Hartmann wrote: >> Nils Eliasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Update t105.java > > Looks good to me! @TobiHartmann and @chhagedorn - Thanks for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5560 From neliasso at openjdk.java.net Mon Sep 20 15:02:54 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 20 Sep 2021 15:02:54 GMT Subject: Integrated: 8273933: [TESTBUG] Test must run without preallocated exceptions In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 09:21:40 GMT, Nils Eliasson wrote: > Executing vmTestbase/jit/t/t105/t105.java with the fix for (JDK-8273277) makes the test fail when run with the following arguments: > > -XX:+TieredCompilation > -XX:Tier0BackedgeNotifyFreqLog=0 > -XX:Tier2BackedgeNotifyFreqLog=0 > -XX:Tier3BackedgeNotifyFreqLog=0 > -XX:Tier2BackEdgeThreshold=1 > -XX:Tier3BackEdgeThreshold=1 > -XX:Tier4BackEdgeThreshold=1 > -Xbatch > > The problem is that the tests expects a detailed message from ArrayIndexOutOfBoundsException, but this test will trigger the optimization that reuses preallocated exceptions that have an empty detailed exceptions. > > It is wrong for the test to assume exceptions messages. > > Solution disable preallocated exceptions with the flag -XX:-ProfileTraps. This pull request has now been integrated. Changeset: 4d95a5d6 Author: Nils Eliasson URL: https://git.openjdk.java.net/jdk/commit/4d95a5d6dc7cc3d2b239c554a1302ac647807bd6 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod 8273933: [TESTBUG] Test must run without preallocated exceptions Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5560 From roland at openjdk.java.net Mon Sep 20 15:39:59 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 20 Sep 2021 15:39:59 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v6] In-Reply-To: <-d_SITgIi5SDUnvKHBFsuBPs05QNn8FxhS35iIIZTNg=.e38ea6bc-e07d-425d-8a7a-c550380c1f13@github.com> References: <-d_SITgIi5SDUnvKHBFsuBPs05QNn8FxhS35iIIZTNg=.e38ea6bc-e07d-425d-8a7a-c550380c1f13@github.com> Message-ID: On Mon, 23 Aug 2021 15:35:53 GMT, Roland Westrelin wrote: >> JDK-8255150 makes it possible for java code to explicitly perform a >> range check on long values. JDK-8223051 provides a transformation of >> long counted loops into loop nests with an inner int counted >> loop. With this change I propose transforming long range checks that >> operate on the iv of a long counted loop into range checks that >> operate on the iv of the int inner loop once it has been >> created. Existing range check eliminations can then kick in. >> >> Transformation of range checks is piggy backed on the loop nest >> creation for 2 reasons: >> >> - pattern matching range checks is easier right before the loop nest >> is created >> >> - the number of iterations of the inner loop is adjusted so scale * >> inner_iv doesn't overflow >> >> C2 has logic to delay some split if transformations so they don't >> break the scale * iv + offset pattern. I reused that logic for long >> range checks and had to relax what's considered a range check because >> initially a range check from Object.checkIndex() may include a test >> for range > 0 that needs a round of loop opts to be hoisted. I realize >> there's some code duplication but I didn't see a way to share logic >> between IdealLoopTree::may_have_range_check() >> IdealLoopTree::policy_range_check() that would feel right. >> >> I realize the comment in PhaseIdealLoop::transform_long_range_checks() >> is scary. FWIW, it's not as complicated as it looks. I found drawing >> the range covered by the entire long loop and the range covered by the >> inner loop help see how range checks can be transformed. Then the >> comment helps make sure all cases are covered and verify the generated >> code actually covers all of them. >> >> One issue is overflow. I think the fact that inner_iv * scale doesn't >> overflow helps simplify thing. One possible overflow is that of scale >> * upper + offset which is handled by forcing all range checks in that >> case to deoptimize. I don't think other case of overflow needs special >> handling. >> >> This was tested with a Memory Segment micro benchmark (and patched >> Memory Segment support to take advantage of the new checkIndex >> intrinsic, both provided by Maurizio). Range checks in the micro >> benchmark are properly optimized (and performance increases >> significantly). > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespace Another comment to keep this alive ------------- PR: https://git.openjdk.java.net/jdk/pull/2045 From aoqi at openjdk.java.net Mon Sep 20 17:01:21 2021 From: aoqi at openjdk.java.net (Ao Qi) Date: Mon, 20 Sep 2021 17:01:21 GMT Subject: RFR: 8273965: some testlibrary_tests/ir_framework tests fail when c1 disabled In-Reply-To: <55dp9r7-ES76GC2UP3kWy2M6Ffxc7xK6uRP-0AtIuaU=.3ace3772-2d1b-4aab-bca0-9838f8ce4655@github.com> References: <55dp9r7-ES76GC2UP3kWy2M6Ffxc7xK6uRP-0AtIuaU=.3ace3772-2d1b-4aab-bca0-9838f8ce4655@github.com> Message-ID: On Mon, 20 Sep 2021 06:42:09 GMT, Christian Hagedorn wrote: >> These tests failed with c2-only build: >> >> test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestCompLevels.java >> test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestControls.java >> test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestRunTests.java > > Looks good! Thanks, @chhagedorn ! Should I get a second reviewer? ------------- PR: https://git.openjdk.java.net/jdk/pull/5576 From github.com+7535718+rsmogura at openjdk.java.net Mon Sep 20 21:23:45 2021 From: github.com+7535718+rsmogura at openjdk.java.net (Radoslaw Smogura) Date: Mon, 20 Sep 2021 21:23:45 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v6] In-Reply-To: References: <-d_SITgIi5SDUnvKHBFsuBPs05QNn8FxhS35iIIZTNg=.e38ea6bc-e07d-425d-8a7a-c550380c1f13@github.com> Message-ID: On Mon, 20 Sep 2021 15:36:47 GMT, Roland Westrelin wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> whitespace > > Another comment to keep this alive Hi @rwestrel I've been pointed here from other PR. I wonder if I can help a little bit. I tried to do something similar and I think that this could be interesting. Firstly I started with using Overflow nodes to check if long values can overflow, I link some experimental work here [1] (minus few commits) However I realized that if we will focus only on int loops, with long range checks, and we put some constraints: * on scale (i.e. `-(1 << 24) < scale < (1<<24)`), and * `0 <= range < max_jlong` * both lower and upper bound is inserted than _probably_ we don't even have to try to catch long overflow. All what we would here is just to tune a little bit `rc_predicate`, to better handle long typed nodes, and make some adjustments in other places. This could be helpful for Panama project, as memory segments and vectors could use following approach I wonder if this would be a good direction, and if I can help somehow? [1] https://github.com/openjdk/panama-vector/pull/128/files ------------- PR: https://git.openjdk.java.net/jdk/pull/2045 From dholmes at openjdk.java.net Tue Sep 21 05:12:50 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 21 Sep 2021 05:12:50 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: <4VZM07gGacQUgxJj5QM6m7LqhHK1Ehw3AyxcSrSQF8U=.15a62f8a-ef46-4c65-b8ad-f6530b74fc33@github.com> On Thu, 16 Sep 2021 17:00:20 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Minor updates as requested by @TheRealMDoerr I think I must have a basic misunderstanding of the problem here, as the described problem seems to be the opposite of what the intent of OmitStackTraceInFastThrow actually is. From the comment in graphKit: // If this throw happens frequently, an uncommon trap might cause // a performance pothole. If there is a local exception handler, // and if this particular bytecode appears to be deoptimizing often, // let us handle the throw inline, with a preconstructed instance. so OmitStackTraceInFastThrow actually allows us to use an optimized fastpath because we can replace a heavyweight stack-full exception with a preallocated stackless one and so avoid the uncommon trap to create the exception. What am I missing? Thanks, David ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From david.holmes at oracle.com Tue Sep 21 05:37:55 2021 From: david.holmes at oracle.com (David Holmes) Date: Tue, 21 Sep 2021 15:37:55 +1000 Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: <4VZM07gGacQUgxJj5QM6m7LqhHK1Ehw3AyxcSrSQF8U=.15a62f8a-ef46-4c65-b8ad-f6530b74fc33@github.com> References: <4VZM07gGacQUgxJj5QM6m7LqhHK1Ehw3AyxcSrSQF8U=.15a62f8a-ef46-4c65-b8ad-f6530b74fc33@github.com> Message-ID: Please ignore - I deleted this comment as I realized what I was missing (the '-' sign) as soon as I posted it. :( David On 21/09/2021 3:12 pm, David Holmes wrote: > On Thu, 16 Sep 2021 17:00:20 GMT, Volker Simonis wrote: > >>> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >>> >>> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >>> >>> public static boolean isAlpha(int c) { >>> try { >>> return IS_ALPHA[c]; >>> } catch (ArrayIndexOutOfBoundsException ex) { >>> return false; >>> } >>> } >>> >>> >>> ### Solution >>> >>> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >>> >>> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >>> Benchmark (exceptionProbability) Mode Cnt Score Error Units >>> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >>> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >>> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >>> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >>> >>> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >>> Benchmark (exceptionProbability) Mode Cnt Score Error Units >>> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >>> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >>> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >>> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >>> >>> >>> ### Implementation details >>> >>> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >>> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >>> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >>> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >>> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. >> >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor updates as requested by @TheRealMDoerr > > I think I must have a basic misunderstanding of the problem here, as the described problem seems to be the opposite of what the intent of OmitStackTraceInFastThrow actually is. From the comment in graphKit: > > // If this throw happens frequently, an uncommon trap might cause > // a performance pothole. If there is a local exception handler, > // and if this particular bytecode appears to be deoptimizing often, > // let us handle the throw inline, with a preconstructed instance. > > so OmitStackTraceInFastThrow actually allows us to use an optimized fastpath because we can replace a heavyweight stack-full exception with a preallocated stackless one and so avoid the uncommon trap to create the exception. > > What am I missing? > > Thanks, > David > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5488 > From shade at openjdk.java.net Tue Sep 21 06:00:53 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 21 Sep 2021 06:00:53 GMT Subject: Integrated: 8273927: Enable hsdis for riscv64 In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 08:38:09 GMT, Aleksey Shipilev wrote: > Currently compiled `hsdis-riscv64.so` binary complains: > > > hsdis: bad native mach=architecture not set in Makefile!; please port hsdis to this platform > > > It seems to be as simple as point to the right BFD arch. > > Additional testing: > - [x] Linux RISC-V port, `-XX:+PrintAssembly` works This pull request has now been integrated. Changeset: 240fa6ef Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/240fa6efa266bc9c9f269c6215aa9df469b6eaa8 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8273927: Enable hsdis for riscv64 Reviewed-by: ihse ------------- PR: https://git.openjdk.java.net/jdk/pull/5558 From shade at openjdk.java.net Tue Sep 21 06:00:52 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 21 Sep 2021 06:00:52 GMT Subject: RFR: 8273927: Enable hsdis for riscv64 In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 08:38:09 GMT, Aleksey Shipilev wrote: > Currently compiled `hsdis-riscv64.so` binary complains: > > > hsdis: bad native mach=architecture not set in Makefile!; please port hsdis to this platform > > > It seems to be as simple as point to the right BFD arch. > > Additional testing: > - [x] Linux RISC-V port, `-XX:+PrintAssembly` works Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/5558 From roland at openjdk.java.net Tue Sep 21 07:04:57 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Tue, 21 Sep 2021 07:04:57 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v6] In-Reply-To: References: <-d_SITgIi5SDUnvKHBFsuBPs05QNn8FxhS35iIIZTNg=.e38ea6bc-e07d-425d-8a7a-c550380c1f13@github.com> Message-ID: On Mon, 20 Sep 2021 21:20:57 GMT, Radoslaw Smogura wrote: > I wonder if this would be a good direction, and if I can help somehow? Do I understand right that you're suggesting modifying c2 so it handles long range checks with the existing range check optimizations (rc_predicate etc.)? The approach this PR has taken is to convert long range checks into int range checks so existing range check optimizations work as is. Ignoring the merit of either approach, what you're suggesting, if I understand right, is different so it can't help this particular PR. If you get what you're suggesting working then I expect you'll hit the same roadblock that this PR hits: someone will have to validate the logic before it integrates. FWIW, John and I both spent quite a bit on time on this. John has validated most of the logic and I updated the PR according to his latest comments. It's unfortunate it's taking that much time, but I doubt going a different road makes a lot of sense at this point. ------------- PR: https://git.openjdk.java.net/jdk/pull/2045 From thartmann at openjdk.java.net Tue Sep 21 07:11:45 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 21 Sep 2021 07:11:45 GMT Subject: RFR: 8273965: some testlibrary_tests/ir_framework tests fail when c1 disabled In-Reply-To: References: Message-ID: <8XXXuksHdjJgJm4C2MFmPdO4IR4W_VSIo10ISw0-D2o=.ba9dae25-4d0f-4a17-86ab-f4925e0d73ab@github.com> On Sat, 18 Sep 2021 15:52:11 GMT, Ao Qi wrote: > These tests failed with c2-only build: > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestCompLevels.java > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestControls.java > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestRunTests.java Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5576 From aoqi at openjdk.java.net Tue Sep 21 07:18:47 2021 From: aoqi at openjdk.java.net (Ao Qi) Date: Tue, 21 Sep 2021 07:18:47 GMT Subject: RFR: 8273965: some testlibrary_tests/ir_framework tests fail when c1 disabled In-Reply-To: <8XXXuksHdjJgJm4C2MFmPdO4IR4W_VSIo10ISw0-D2o=.ba9dae25-4d0f-4a17-86ab-f4925e0d73ab@github.com> References: <8XXXuksHdjJgJm4C2MFmPdO4IR4W_VSIo10ISw0-D2o=.ba9dae25-4d0f-4a17-86ab-f4925e0d73ab@github.com> Message-ID: On Tue, 21 Sep 2021 07:08:36 GMT, Tobias Hartmann wrote: >> These tests failed with c2-only build: >> >> test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestCompLevels.java >> test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestControls.java >> test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestRunTests.java > > Looks good to me too. Thanks, @TobiHartmann ! ------------- PR: https://git.openjdk.java.net/jdk/pull/5576 From github.com+741251+turbanoff at openjdk.java.net Tue Sep 21 07:27:15 2021 From: github.com+741251+turbanoff at openjdk.java.net (Andrey Turbanov) Date: Tue, 21 Sep 2021 07:27:15 GMT Subject: RFR: 8274048 IGV: Replace usages of Collections.sort with List.sort call Message-ID: <8KIga70tt6TEH9vMuxl8OuDQWDOX3qCIeMeIC8OsfdU=.22a04314-43c6-48b0-b87f-224d2fd8060b@github.com> Collections.sort is just a wrapper, so it is better to use an instance method directly. ------------- Commit messages: - [PATCH] Replace usages of Collections.sort with List.sort call in IGV Changes: https://git.openjdk.java.net/jdk/pull/5228/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5228&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274048 Stats: 24 lines in 8 files changed: 0 ins; 2 del; 22 mod Patch: https://git.openjdk.java.net/jdk/pull/5228.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5228/head:pull/5228 PR: https://git.openjdk.java.net/jdk/pull/5228 From kizune at openjdk.java.net Tue Sep 21 09:02:53 2021 From: kizune at openjdk.java.net (Alexander Zuev) Date: Tue, 21 Sep 2021 09:02:53 GMT Subject: RFR: 8268764: Use Long.hashCode() instead of int-cast where applicable [v4] In-Reply-To: References: Message-ID: On Thu, 1 Jul 2021 12:19:53 GMT, ?????? ??????? wrote: >> In some JDK classes there's still the following hashCode() implementation: >> >> long objNum; >> >> public int hashCode() { >> return (int) objNum; >> } >> >> This outdated expression should be replaced with Long.hashCode(long) as it >> >> - uses all bits of the original value, does not discard any information upfront. For example, depending on how you are generating the IDs, the upper bits could change more frequently (or the opposite). >> >> - does not introduce any bias towards values with more ones (zeros), as it would be the case if the two halves were combined with an OR (AND) operation. >> >> See https://stackoverflow.com/a/4045083 >> >> This is related to https://github.com/openjdk/jdk/pull/4309 > > ?????? ??????? has updated the pull request incrementally with one additional commit since the last revision: > > 8268764: Update copy-right year Marked as reviewed by kizune (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4491 From simonis at openjdk.java.net Tue Sep 21 10:09:11 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 21 Sep 2021 10:09:11 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: References: Message-ID: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > For the attached JTreg test, we get the following exception in interpreter mode: > > java.lang.NullPointerException: Cannot read the array length because "" is null > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > java.lang.NullPointerException > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > ## Implementation details > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5392/files - new: https://git.openjdk.java.net/jdk/pull/5392/files/07ebd638..f4a205b1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5392&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5392&range=01-02 Stats: 34 lines in 3 files changed: 13 ins; 5 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/5392.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5392/head:pull/5392 PR: https://git.openjdk.java.net/jdk/pull/5392 From simonis at openjdk.java.net Tue Sep 21 10:17:46 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 21 Sep 2021 10:17:46 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. Hi Richard, thanks one more time for pointing out the issue with the class mirrors in the backtrace which keep classes alive and potentially prevents them from being unloaded. Fortunately, I think the solution is pretty simple. I don't think we need the backtrace at all. In the end it is just an optimization to save some space and not construct the full StackTraceElement[] right at the creation time of an exception. But the implicit exceptions which we are creating here are "nmethod-singletons" and as such I don't think we loose much if we create the array of StackTraceElements right away instead of creating a backtrace (see my last push). The StackTraceElements only contain Strings and therefore don't keep any classes unnecessarily alive. What do you think? And once you're on it, would you mind reviewing the whole PR :) Thank you and best regards, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From github.com+71546117+tobiasholenstein at openjdk.java.net Tue Sep 21 11:25:54 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Tue, 21 Sep 2021 11:25:54 GMT Subject: RFR: JDK-8272703: StressSeed should be set via FLAG_SET_ERGO Message-ID: Set the `StressSeed` (added by JDK-8252219) via `FLAG_SET_ERGO` if the seed is generated by the VM (i.e., not set via the command line). This way, `StressSeed` will be added to the "[Global flags]" section of the hs_err file on crash and can be used to reproduce the issue. If `RepeatCompilation` is on and no `StressSeed` is set, a new `StressSeed` is generated for every compilation. The reason for this is that some Bugs are only triggered intermittent, because they depend on the `StressSeed`. ------------- Commit messages: - JDK-8272703: RepeatCompilation check added - JDK-8272703: RepeatCompilation - JDK-8272703: StressSeed should be set via FLAG_SET_ERGO Changes: https://git.openjdk.java.net/jdk/pull/5599/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5599&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272703 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5599.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5599/head:pull/5599 PR: https://git.openjdk.java.net/jdk/pull/5599 From phedlin at openjdk.java.net Tue Sep 21 11:38:02 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 21 Sep 2021 11:38:02 GMT Subject: RFR: 8274039: codestrings gtest fails when hsdis is present Message-ID: Please review this change to the (g)test-case "codestrings". Adding missing regexp pattern to remove trailing **hsdis** printout due to padding on x64. ------------- Commit messages: - 8274039: codestrings gtest fails when hsdis is present Changes: https://git.openjdk.java.net/jdk/pull/5606/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5606&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274039 Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5606.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5606/head:pull/5606 PR: https://git.openjdk.java.net/jdk/pull/5606 From phedlin at openjdk.java.net Tue Sep 21 11:38:03 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 21 Sep 2021 11:38:03 GMT Subject: RFR: 8274039: codestrings gtest fails when hsdis is present In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 11:29:49 GMT, Patric Hedlin wrote: > Please review this change to the (g)test-case "codestrings". > > Adding missing regexp pattern to remove trailing **hsdis** printout due to padding on x64. Disassembly diff is only used/attempted when **hsdis** is available, e.g. through LD_LIBRARY_PATH, but the approach is rather brittle. Same issue should be expected on other platforms (in particular platforms I do not have access to). Should the test-case **ifdef** on known platforms? ------------- PR: https://git.openjdk.java.net/jdk/pull/5606 From aph at openjdk.java.net Tue Sep 21 12:11:52 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 21 Sep 2021 12:11:52 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v5] In-Reply-To: References: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> Message-ID: On Fri, 17 Sep 2021 09:59:14 GMT, Ningsheng Jian wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix bugs > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2831: > >> 2829: f(0b000001, 15, 10), rf(Vn, 5), rf(Vd, 0); >> 2830: } >> 2831: > > What's the difference between dups and the dup right above? One is _Advanced SIMD copy_, the other _Advanced SIMD scalar copy_. The _Advanced SIMD copy_ version is missing a comment to that effect. Maybe one day someone could go through assembler_aarch64.hpp and add the missing titles from the A64 instruction set encoding section of the ARM. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From chagedorn at openjdk.java.net Tue Sep 21 13:03:16 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 21 Sep 2021 13:03:16 GMT Subject: RFR: JDK-8272703: StressSeed should be set via FLAG_SET_ERGO In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 08:38:57 GMT, Tobias Holenstein wrote: > Set the `StressSeed` (added by JDK-8252219) via `FLAG_SET_ERGO` if the seed is generated by the VM (i.e., not set via the command line). This way, `StressSeed` will be added to the "[Global flags]" section of the hs_err file on crash and can be used to reproduce the issue. > > If `RepeatCompilation` is on and no `StressSeed` is set, a new `StressSeed` is generated for every compilation. The reason for this is that some Bugs are only triggered intermittent, because they depend on the `StressSeed`. That looks good to me. I think it's valuable to still generate new seeds when running with `RepeatCompilation` which helped quite a lot to reproduce bugs in the past which were otherwise hard to trigger. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5599 From shade at openjdk.java.net Tue Sep 21 13:25:01 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 21 Sep 2021 13:25:01 GMT Subject: RFR: 8274060: C2: Incorrect computation after JDK-8273454 Message-ID: A Fuzzer test caught a serious regression after [JDK-8273454](https://bugs.openjdk.java.net/browse/JDK-8273454): the results are different in (interpreter, C1) vs C2. See the original test cases in the bug. I believe the trouble is due to `And*Node`-s sharing code with `MulNode`, which means we enter the new transformation here: Node *AndINode::Ideal(PhaseGVN *phase, bool can_reshape) { // Special case constant AND mask const TypeInt *t2 = phase->type( in(2) )->isa_int(); if( !t2 || !t2->is_con() ) return MulNode::Ideal(phase, can_reshape); // <--- calls new code through here So while new optimization `((-x) * (-y)) -> (x * y)` is correct, but doing the same for `((-x) & (-y)) => (x & y)` is not! I opted to test the opcodes directly instead of introducing virtual methods in `MulNode`. Let me know if you prefer otherwise. Additional testing: - [x] Original tests now pass - [x] New regression test copied duplicates one for JDK-8273454, but it verifies that `&` are not broken - [ ] `tier1` tests ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/5612/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5612&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274060 Stats: 85 lines in 2 files changed: 82 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5612.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5612/head:pull/5612 PR: https://git.openjdk.java.net/jdk/pull/5612 From thartmann at openjdk.java.net Tue Sep 21 13:50:33 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 21 Sep 2021 13:50:33 GMT Subject: RFR: 8274060: C2: Incorrect computation after JDK-8273454 In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 13:13:37 GMT, Aleksey Shipilev wrote: > A Fuzzer test caught a serious regression after [JDK-8273454](https://bugs.openjdk.java.net/browse/JDK-8273454): the results are different in (interpreter, C1) vs C2. See the original test cases in the bug. I believe the trouble is due to `And*Node`-s sharing code with `MulNode` (for [reasons](https://github.com/openjdk/jdk/blob/42d5d2abaad8a88a5e1326ea8b4494aeb8b5748b/src/hotspot/share/opto/mulnode.hpp#L168-L169)), which means we enter the new transformation here: > > > Node *AndINode::Ideal(PhaseGVN *phase, bool can_reshape) { > // Special case constant AND mask > const TypeInt *t2 = phase->type( in(2) )->isa_int(); > if( !t2 || !t2->is_con() ) return MulNode::Ideal(phase, can_reshape); // <--- calls new code through here > > > So while new optimization `((-x) * (-y)) => (x * y)` is correct, doing the same for `((-x) & (-y)) => (x & y)` is not! > > I opted to test the opcodes directly instead of introducing virtual methods in `MulNode`. Let me know if you prefer otherwise. > > Additional testing: > - [x] Original tests now pass > - [x] New regression test is copied from original for JDK-8273454, but new copy verifies that `&` operate the same (fails without the C2 fix) > - [ ] `tier1` tests Ouh, good catch. The fix looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5612 From neliasso at openjdk.java.net Tue Sep 21 13:54:35 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 21 Sep 2021 13:54:35 GMT Subject: RFR: 8274060: C2: Incorrect computation after JDK-8273454 In-Reply-To: References: Message-ID: <_dTkGkwgye2gmLnVzhKU0XrxVPYMXR1Do8PqejOI6CE=.ddc47ece-5f02-4409-b4e2-12d5ff9a4d4d@github.com> On Tue, 21 Sep 2021 13:13:37 GMT, Aleksey Shipilev wrote: > A Fuzzer test caught a serious regression after [JDK-8273454](https://bugs.openjdk.java.net/browse/JDK-8273454): the results are different in (interpreter, C1) vs C2. See the original test cases in the bug. I believe the trouble is due to `And*Node`-s sharing code with `MulNode` (for [reasons](https://github.com/openjdk/jdk/blob/42d5d2abaad8a88a5e1326ea8b4494aeb8b5748b/src/hotspot/share/opto/mulnode.hpp#L168-L169)), which means we enter the new transformation here: > > > Node *AndINode::Ideal(PhaseGVN *phase, bool can_reshape) { > // Special case constant AND mask > const TypeInt *t2 = phase->type( in(2) )->isa_int(); > if( !t2 || !t2->is_con() ) return MulNode::Ideal(phase, can_reshape); // <--- calls new code through here > > > So while new optimization `((-x) * (-y)) => (x * y)` is correct, doing the same for `((-x) & (-y)) => (x & y)` is not! > > I opted to test the opcodes directly instead of introducing virtual methods in `MulNode`. Let me know if you prefer otherwise. > > Additional testing: > - [x] Original tests now pass > - [x] New regression test is copied from original for JDK-8273454, but new copy verifies that `&` operate the same (fails without the C2 fix) > - [ ] `tier1` tests Looks good. Thanks for fixing! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5612 From chagedorn at openjdk.java.net Tue Sep 21 13:59:37 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Tue, 21 Sep 2021 13:59:37 GMT Subject: RFR: 8274060: C2: Incorrect computation after JDK-8273454 In-Reply-To: References: Message-ID: <5vHbs1cNSuH7kxA140rHOWjVwmkMZLEaXYJ-GvJigIQ=.ceaaa5b9-01fb-4914-96ff-59201d755117@github.com> On Tue, 21 Sep 2021 13:13:37 GMT, Aleksey Shipilev wrote: > A Fuzzer test caught a serious regression after [JDK-8273454](https://bugs.openjdk.java.net/browse/JDK-8273454): the results are different in (interpreter, C1) vs C2. See the original test cases in the bug. I believe the trouble is due to `And*Node`-s sharing code with `MulNode` (for [reasons](https://github.com/openjdk/jdk/blob/42d5d2abaad8a88a5e1326ea8b4494aeb8b5748b/src/hotspot/share/opto/mulnode.hpp#L168-L169)), which means we enter the new transformation here: > > > Node *AndINode::Ideal(PhaseGVN *phase, bool can_reshape) { > // Special case constant AND mask > const TypeInt *t2 = phase->type( in(2) )->isa_int(); > if( !t2 || !t2->is_con() ) return MulNode::Ideal(phase, can_reshape); // <--- calls new code through here > > > So while new optimization `((-x) * (-y)) => (x * y)` is correct, doing the same for `((-x) & (-y)) => (x & y)` is not! > > I opted to test the opcodes directly instead of introducing virtual methods in `MulNode`. Let me know if you prefer otherwise. > > Additional testing: > - [x] Original tests now pass > - [x] New regression test is copied from original for JDK-8273454, but new copy verifies that `&` operate the same (fails without the C2 fix) > - [ ] `tier1` tests Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5612 From shade at openjdk.java.net Tue Sep 21 14:01:38 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 21 Sep 2021 14:01:38 GMT Subject: RFR: 8274039: codestrings gtest fails when hsdis is present In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 11:29:49 GMT, Patric Hedlin wrote: > Please review this change to the (g)test-case "codestrings". > > Adding missing regexp pattern to remove trailing **hsdis** printout due to padding on x64. This fixes the test failure for my x86_64. Since this is a test bug, I think it is fine to add more platform-specific fixes as needed. Alternatively, we might want to compare that the disassembly *prefixes* are the same? test/hotspot/gtest/code/test_codestrings.cpp line 46: > 44: std::basic_string tmp2 = std::regex_replace(tmp1, std::regex("\\s+:\\s+\\.inst\\t ; undefined"), ""); > 45: // Padding: x64 > 46: std::basic_string red = std::regex_replace(tmp2, std::regex(" : hlt \\n"), ""); Can we do `\\s+` here as well, like in `aarch64` case? Pretty sure this whitespace can be subtly different. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5606 From github.com+71546117+tobiasholenstein at openjdk.java.net Tue Sep 21 14:15:32 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Tue, 21 Sep 2021 14:15:32 GMT Subject: RFR: JDK-8272703: StressSeed should be set via FLAG_SET_ERGO In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 12:58:00 GMT, Christian Hagedorn wrote: > That looks good to me. I think it's valuable to still generate new seeds when running with `RepeatCompilation` which helped quite a lot to reproduce bugs in the past which were otherwise hard to trigger. Thanks! @chhagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/5599 From github.com+71546117+tobiasholenstein at openjdk.java.net Tue Sep 21 14:27:03 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Tue, 21 Sep 2021 14:27:03 GMT Subject: RFR: JDK-8270156: Add randomness and stress keys to JTreg tests which use StressGCM, StressLCM and/or StressIGVN Message-ID: Added the keys "randomness" and "stress" to the JTreg tests which use StressGCM, StressLCM and/or StressIGVN and did not use the keys before. ------------- Commit messages: - JDK-8270156: Add randomness and stress keys to JTreg tests which use StressGCM, StressLCM and/or StressIGVN Changes: https://git.openjdk.java.net/jdk/pull/5614/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5614&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8270156 Stats: 10 lines in 8 files changed: 8 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5614.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5614/head:pull/5614 PR: https://git.openjdk.java.net/jdk/pull/5614 From phedlin at openjdk.java.net Tue Sep 21 14:40:42 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 21 Sep 2021 14:40:42 GMT Subject: RFR: 8274039: codestrings gtest fails when hsdis is present In-Reply-To: References: Message-ID: <2o9A6guIu9zZjIQDskjW2E2fiRhixH4sE4_W75ACcTY=.ae6c10d3-c5d1-45a7-b476-49089622b8f2@github.com> On Tue, 21 Sep 2021 13:41:29 GMT, Aleksey Shipilev wrote: >> Please review this change to the (g)test-case "codestrings". >> >> Adding missing regexp pattern to remove trailing **hsdis** printout due to padding on x64. > > test/hotspot/gtest/code/test_codestrings.cpp line 46: > >> 44: std::basic_string tmp2 = std::regex_replace(tmp1, std::regex("\\s+:\\s+\\.inst\\t ; undefined"), ""); >> 45: // Padding: x64 >> 46: std::basic_string red = std::regex_replace(tmp2, std::regex(" : hlt \\n"), ""); > > Can we do `\\s+` here as well, like in `aarch64` case? Pretty sure this whitespace can be subtly different. We could (and I did start in that end) but I decided to be as specific as possible since the pattern is such a "plain" one (unlike the pattern for aarch64 which is more distinct). But you do have a point of course. I will update the regexp. ------------- PR: https://git.openjdk.java.net/jdk/pull/5606 From phedlin at openjdk.java.net Tue Sep 21 15:26:57 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 21 Sep 2021 15:26:57 GMT Subject: RFR: 8274039: codestrings gtest fails when hsdis is present [v2] In-Reply-To: References: Message-ID: > Please review this change to the (g)test-case "codestrings". > > Adding missing regexp pattern to remove trailing **hsdis** printout due to padding on x64. Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: Changed x64 padding pattern to a more general match. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5606/files - new: https://git.openjdk.java.net/jdk/pull/5606/files/7fe2b291..597e779b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5606&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5606&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5606.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5606/head:pull/5606 PR: https://git.openjdk.java.net/jdk/pull/5606 From shade at openjdk.java.net Tue Sep 21 16:21:30 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 21 Sep 2021 16:21:30 GMT Subject: RFR: 8274039: codestrings gtest fails when hsdis is present [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 15:26:57 GMT, Patric Hedlin wrote: >> Please review this change to the (g)test-case "codestrings". >> >> Adding missing regexp pattern to remove trailing **hsdis** printout due to padding on x64. > > Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: > > Changed x64 padding pattern to a more general match. Not sure why the first group is `[ \\t]+` in x86_64 match, but it still works on my machine. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5606 From phedlin at openjdk.java.net Tue Sep 21 16:31:29 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Tue, 21 Sep 2021 16:31:29 GMT Subject: RFR: 8274039: codestrings gtest fails when hsdis is present [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 15:26:57 GMT, Patric Hedlin wrote: >> Please review this change to the (g)test-case "codestrings". >> >> Adding missing regexp pattern to remove trailing **hsdis** printout due to padding on x64. > > Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: > > Changed x64 padding pattern to a more general match. Just in order not to eat the last "\n" before the padded output. ------------- PR: https://git.openjdk.java.net/jdk/pull/5606 From simonis at openjdk.java.net Tue Sep 21 17:11:32 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 21 Sep 2021 17:11:32 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 08:14:41 GMT, Martin Doerr wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor updates as requested by @TheRealMDoerr > > This looks like a great idea. I have a few minor remarks / suggestions. @TheRealMDoerr, are you fine with this change now? ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From mdoerr at openjdk.java.net Tue Sep 21 18:12:34 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 21 Sep 2021 18:12:34 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 17:00:20 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Minor updates as requested by @TheRealMDoerr Thanks for the update! I haven't looked into every detail, yet, but it basically looks good to me. I think we should disable OmitStackTraceInFastThrow and run a substantial amount of tests. Otherwise, test coverage could be poor. Did you do that already? Running without OmitStackTraceInFastThrow is indeed relevant for us. Would be interesting to know what else benefits from it. Maybe startup performance (class loading may use many Exceptions). ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From xliu at openjdk.java.net Tue Sep 21 22:32:03 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 21 Sep 2021 22:32:03 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. > for me it's not an enhancement, but a bug fix, in production an exception with no stacktrace is useless and result in hours lost trying to figure out the issue If we treat it as a bug, shall we remove `StackFrameInFastThrow'? we can just make this the default behavior of `OmitStackTraceInFastThrow`. Why `OmitStackTraceInFastThrow` isn't a c2-exclusive option? I think it only affects c2. That flag overrides the existing flag `StackTraceInThrowable`. Let not introduce another flag. No one would like an exception without a pointer. test/hotspot/jtreg/compiler/exceptions/StackFrameInFastThrow.java line 26: > 24: /* > 25: * @test > 26: * @bug 9999999 Should this be 8273392? 'requires' supports boolean expression |. Therefore, we don't need two annotations. test/hotspot/jtreg/compiler/exceptions/StackFrameInFastThrow.java line 110: > 108: private static void unload(Method m) { > 109: Asserts.assertEQ(WB.getMethodCompilationLevel(m), 4, "Method should be compiled at level 4."); > 110: if (DEBUG) System.console().readLine(); Is this "press any key" from stdin? I got problem to invoke it when I set DEBUG=1. We better off remove these statements in case it trip test up. TEST RESULT: Failed. Execution failed: `main' threw exception: java.lang.NullPointerException: Cannot invoke "java.io.Console.readLine()" because the return value of "java.lang.System.console()" is null ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From xliu at openjdk.java.net Tue Sep 21 23:36:02 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 21 Sep 2021 23:36:02 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. src/hotspot/share/classfile/javaClasses.cpp line 2581: > 2579: assert(ik != NULL, "must be loaded in 1.4+"); > 2580: > 2581: // Determin the number of available frames typo ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From xliu at openjdk.java.net Tue Sep 21 23:40:05 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 21 Sep 2021 23:40:05 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. src/hotspot/share/ci/ciEnv.cpp line 407: > 405: if (!HAS_PENDING_EXCEPTION) { > 406: Handle handle = Handle(THREAD, obj); > 407: java_lang_Throwable::allocate_fill_stack_trace_of_implicit_exception(handle, gk); IIHO, hotspot fills stacktrace when StackTraceInThrowable is true. ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From aoqi at openjdk.java.net Wed Sep 22 01:19:59 2021 From: aoqi at openjdk.java.net (Ao Qi) Date: Wed, 22 Sep 2021 01:19:59 GMT Subject: RFR: 8273965: some testlibrary_tests/ir_framework tests fail when c1 disabled In-Reply-To: References: Message-ID: On Sat, 18 Sep 2021 15:52:11 GMT, Ao Qi wrote: > These tests failed with c2-only build: > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestCompLevels.java > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestControls.java > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestRunTests.java Can anyone sponsor this? ------------- PR: https://git.openjdk.java.net/jdk/pull/5576 From aoqi at openjdk.java.net Wed Sep 22 02:33:57 2021 From: aoqi at openjdk.java.net (Ao Qi) Date: Wed, 22 Sep 2021 02:33:57 GMT Subject: RFR: 8273965: some testlibrary_tests/ir_framework tests fail when c1 disabled In-Reply-To: References: Message-ID: <9IcyjSm5fL2ssEJPz7Q5AoWu79lJ5gTvY97RbIXsJ9c=.75f1c829-5dbd-4b87-bb55-1ea2394586df@github.com> On Sat, 18 Sep 2021 15:52:11 GMT, Ao Qi wrote: > These tests failed with c2-only build: > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestCompLevels.java > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestControls.java > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestRunTests.java Thanks, @kelthuzadx ! ------------- PR: https://git.openjdk.java.net/jdk/pull/5576 From aoqi at openjdk.java.net Wed Sep 22 02:33:58 2021 From: aoqi at openjdk.java.net (Ao Qi) Date: Wed, 22 Sep 2021 02:33:58 GMT Subject: Integrated: 8273965: some testlibrary_tests/ir_framework tests fail when c1 disabled In-Reply-To: References: Message-ID: <6WE6EA7D7CQvxw4WK6PIbHo0mMTY4bp2_kLyJfayiBM=.7e7e0858-8ef9-463b-a0c7-29ff72d530a6@github.com> On Sat, 18 Sep 2021 15:52:11 GMT, Ao Qi wrote: > These tests failed with c2-only build: > > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestCompLevels.java > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestControls.java > test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestRunTests.java This pull request has now been integrated. Changeset: 517405e4 Author: Ao Qi Committer: Yi Yang URL: https://git.openjdk.java.net/jdk/commit/517405e462dc6104c33471c58242ea7b244c6218 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod 8273965: some testlibrary_tests/ir_framework tests fail when c1 disabled Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5576 From njian at openjdk.java.net Wed Sep 22 03:12:58 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Wed, 22 Sep 2021 03:12:58 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v5] In-Reply-To: References: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> Message-ID: On Tue, 21 Sep 2021 12:08:25 GMT, Andrew Haley wrote: > One is _Advanced SIMD copy_, the other _Advanced SIMD scalar copy_. The _Advanced SIMD copy_ version is missing a comment to that effect. Maybe one day someone could go through assembler_aarch64.hpp and add the missing titles from the A64 instruction set encoding section of the ARM. Yes, a comment will be helpful. But for the scalar copy, we should use SIMD_RegVariant instead of SIMD_Arrangement? ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From ngasson at openjdk.java.net Wed Sep 22 03:24:11 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 22 Sep 2021 03:24:11 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v7] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: <053YSRWJjndpXnMFcY0hOx9SUHWrSJDtSbAcLRs_QLc=.f2914915-321e-45a8-bf1a-ead879600bca@github.com> On Fri, 17 Sep 2021 06:53:06 GMT, Ningsheng Jian wrote: >> This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: >> >> 1. Code generation for Vector API c2 IR nodes with SVE. >> 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. >> 3. Some more SVE assemblers (and tests) used by the codegen part. >> >> Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask >> >> >> Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. > > Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge with master > - Merge with master > - More comments from Andrew. > - Add missing part > - Address Andrew's comments > - 8267356: AArch64: Vector API SVE codegen support > > This is the integration of current SVE work done in > panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on > 256-bit SVE environment could also generate optimized SVE > instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further > improvement to map mask to predicate register is under development at > https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware > with MaxVectorSize=16/32/64. Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From njian at openjdk.java.net Wed Sep 22 03:24:11 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Wed, 22 Sep 2021 03:24:11 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v7] In-Reply-To: <053YSRWJjndpXnMFcY0hOx9SUHWrSJDtSbAcLRs_QLc=.f2914915-321e-45a8-bf1a-ead879600bca@github.com> References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> <053YSRWJjndpXnMFcY0hOx9SUHWrSJDtSbAcLRs_QLc=.f2914915-321e-45a8-bf1a-ead879600bca@github.com> Message-ID: On Wed, 22 Sep 2021 03:17:47 GMT, Nick Gasson wrote: >> Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge with master >> - Merge with master >> - More comments from Andrew. >> - Add missing part >> - Address Andrew's comments >> - 8267356: AArch64: Vector API SVE codegen support >> >> This is the integration of current SVE work done in >> panama-vector/vectorIntrinscs, which includes: >> >> 1. Code generation for Vector API c2 IR nodes with SVE. >> 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on >> 256-bit SVE environment could also generate optimized SVE >> instructions with predicate feature. >> 3. Some more SVE assemblers (and tests) used by the codegen part. >> >> Note: VectorMask is still represented in vector register, a further >> improvement to map mask to predicate register is under development at >> https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask >> >> Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware >> with MaxVectorSize=16/32/64. > > Marked as reviewed by ngasson (Reviewer). Thank you @nick-arm for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From thartmann at openjdk.java.net Wed Sep 22 06:27:54 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 22 Sep 2021 06:27:54 GMT Subject: RFR: JDK-8270156: Add "randomness" and "stress" keys to JTreg tests which use StressGCM, StressLCM and/or StressIGVN In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 14:17:28 GMT, Tobias Holenstein wrote: > Added the keys "randomness" and "stress" to the JTreg tests which use StressGCM, StressLCM and/or StressIGVN and did not use the keys before. Looks good and trivial to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5614 From chagedorn at openjdk.java.net Wed Sep 22 06:27:54 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 22 Sep 2021 06:27:54 GMT Subject: RFR: JDK-8270156: Add "randomness" and "stress" keys to JTreg tests which use StressGCM, StressLCM and/or StressIGVN In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 14:17:28 GMT, Tobias Holenstein wrote: > Added the keys "randomness" and "stress" to the JTreg tests which use StressGCM, StressLCM and/or StressIGVN and did not use the keys before. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5614 From thartmann at openjdk.java.net Wed Sep 22 06:30:53 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 22 Sep 2021 06:30:53 GMT Subject: RFR: JDK-8272703: StressSeed should be set via FLAG_SET_ERGO In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 08:38:57 GMT, Tobias Holenstein wrote: > Set the `StressSeed` (added by JDK-8252219) via `FLAG_SET_ERGO` if the seed is generated by the VM (i.e., not set via the command line). This way, `StressSeed` will be added to the "[Global flags]" section of the hs_err file on crash and can be used to reproduce the issue. > > If `RepeatCompilation` is on and no `StressSeed` is set, a new `StressSeed` is generated for every compilation. The reason for this is that some Bugs are only triggered intermittent, because they depend on the `StressSeed`. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5599 From never at openjdk.java.net Wed Sep 22 06:37:12 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Wed, 22 Sep 2021 06:37:12 GMT Subject: RFR: 8274120: [JVMCI] CompileBroker should resolve parameter types for JVMCI compiles Message-ID: 8274120: [JVMCI] CompileBroker should resolve parameter types for JVMCI compiles ------------- Commit messages: - 8274120: [JVMCI] CompileBroker should resolve parameter types for JVMCI compiles Changes: https://git.openjdk.java.net/jdk/pull/5627/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5627&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274120 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5627.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5627/head:pull/5627 PR: https://git.openjdk.java.net/jdk/pull/5627 From aph at openjdk.java.net Wed Sep 22 08:02:05 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 22 Sep 2021 08:02:05 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v5] In-Reply-To: References: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> Message-ID: On Wed, 22 Sep 2021 03:10:14 GMT, Ningsheng Jian wrote: >> One is _Advanced SIMD copy_, the other _Advanced SIMD scalar copy_. The _Advanced SIMD copy_ version is missing a comment to that effect. Maybe one day someone could go through assembler_aarch64.hpp and add the missing titles from the A64 instruction set encoding section of the ARM. > >> One is _Advanced SIMD copy_, the other _Advanced SIMD scalar copy_. The _Advanced SIMD copy_ version is missing a comment to that effect. Maybe one day someone could go through assembler_aarch64.hpp and add the missing titles from the A64 instruction set encoding section of the ARM. > > Yes, a comment will be helpful. But for the scalar copy, we should use SIMD_RegVariant instead of SIMD_Arrangement? It looks like the official doc uses _SIMD_RegVariant_ for the scalar copy, so we should do the same. I don't think you'll ever find me suggesting we should deviate from what the Arm reference manuals do, unless we can't avoid it. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From aph at openjdk.java.net Wed Sep 22 08:06:11 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 22 Sep 2021 08:06:11 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v5] In-Reply-To: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> References: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> Message-ID: On Tue, 7 Sep 2021 10:11:08 GMT, Wang Huang wrote: >> * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. >> * It may be a solver of JDK-8269866, or part of it. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix bugs src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 233: > 231: match(Set dst (VectorCastB2X src)); > 232: format %{ "sxtl $dst, T8H, $src, T8B\n\t" > 233: "sxtl $dst, T4S, $dst, T4H\n\t" This whitespace change looks odd. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From chagedorn at openjdk.java.net Wed Sep 22 08:08:15 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 22 Sep 2021 08:08:15 GMT Subject: RFR: 8274048 IGV: Replace usages of Collections.sort with List.sort call In-Reply-To: <8KIga70tt6TEH9vMuxl8OuDQWDOX3qCIeMeIC8OsfdU=.22a04314-43c6-48b0-b87f-224d2fd8060b@github.com> References: <8KIga70tt6TEH9vMuxl8OuDQWDOX3qCIeMeIC8OsfdU=.22a04314-43c6-48b0-b87f-224d2fd8060b@github.com> Message-ID: <0OfM5xUaSMIoqbxGwccO9l_NzLKswWtu-SwdzI674_Y=.626050b8-2c96-4681-99ce-40a744505599@github.com> On Mon, 23 Aug 2021 20:52:22 GMT, Andrey Turbanov wrote: > Collections.sort is just a wrapper, so it is better to use an instance method directly. Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5228 From shade at openjdk.java.net Wed Sep 22 08:43:06 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 22 Sep 2021 08:43:06 GMT Subject: RFR: 8274060: C2: Incorrect computation after JDK-8273454 In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 13:13:37 GMT, Aleksey Shipilev wrote: > A Fuzzer test caught a serious regression after [JDK-8273454](https://bugs.openjdk.java.net/browse/JDK-8273454): the results are different in (interpreter, C1) vs C2. See the original test cases in the bug. I believe the trouble is due to `And*Node`-s sharing code with `MulNode` (for [reasons](https://github.com/openjdk/jdk/blob/42d5d2abaad8a88a5e1326ea8b4494aeb8b5748b/src/hotspot/share/opto/mulnode.hpp#L168-L169)), which means we enter the new transformation here: > > > Node *AndINode::Ideal(PhaseGVN *phase, bool can_reshape) { > // Special case constant AND mask > const TypeInt *t2 = phase->type( in(2) )->isa_int(); > if( !t2 || !t2->is_con() ) return MulNode::Ideal(phase, can_reshape); // <--- calls new code through here > > > So while new optimization `((-x) * (-y)) => (x * y)` is correct, doing the same for `((-x) & (-y)) => (x & y)` is not! > > I opted to test the opcodes directly instead of introducing virtual methods in `MulNode`. Let me know if you prefer otherwise. > > Additional testing: > - [x] Original tests now pass > - [x] New regression test is copied from original for JDK-8273454, but new copy verifies that `&` operate the same (fails without the C2 fix) > - [ ] `tier1` tests GHA checks ran into MacOS infra trouble, resubmitted, those complete green. I am integrating. ------------- PR: https://git.openjdk.java.net/jdk/pull/5612 From shade at openjdk.java.net Wed Sep 22 08:43:08 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 22 Sep 2021 08:43:08 GMT Subject: Integrated: 8274060: C2: Incorrect computation after JDK-8273454 In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 13:13:37 GMT, Aleksey Shipilev wrote: > A Fuzzer test caught a serious regression after [JDK-8273454](https://bugs.openjdk.java.net/browse/JDK-8273454): the results are different in (interpreter, C1) vs C2. See the original test cases in the bug. I believe the trouble is due to `And*Node`-s sharing code with `MulNode` (for [reasons](https://github.com/openjdk/jdk/blob/42d5d2abaad8a88a5e1326ea8b4494aeb8b5748b/src/hotspot/share/opto/mulnode.hpp#L168-L169)), which means we enter the new transformation here: > > > Node *AndINode::Ideal(PhaseGVN *phase, bool can_reshape) { > // Special case constant AND mask > const TypeInt *t2 = phase->type( in(2) )->isa_int(); > if( !t2 || !t2->is_con() ) return MulNode::Ideal(phase, can_reshape); // <--- calls new code through here > > > So while new optimization `((-x) * (-y)) => (x * y)` is correct, doing the same for `((-x) & (-y)) => (x & y)` is not! > > I opted to test the opcodes directly instead of introducing virtual methods in `MulNode`. Let me know if you prefer otherwise. > > Additional testing: > - [x] Original tests now pass > - [x] New regression test is copied from original for JDK-8273454, but new copy verifies that `&` operate the same (fails without the C2 fix) > - [ ] `tier1` tests This pull request has now been integrated. Changeset: c77ebe88 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/c77ebe88748b0a55f1fc7a5497314a752eab1e2a Stats: 85 lines in 2 files changed: 82 ins; 1 del; 2 mod 8274060: C2: Incorrect computation after JDK-8273454 Reviewed-by: thartmann, neliasso, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/5612 From dnsimon at openjdk.java.net Wed Sep 22 09:33:59 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 22 Sep 2021 09:33:59 GMT Subject: RFR: 8274120: [JVMCI] CompileBroker should resolve parameter types for JVMCI compiles In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 06:26:42 GMT, Tom Rodriguez wrote: > 8274120: [JVMCI] CompileBroker should resolve parameter types for JVMCI compiles Marked as reviewed by dnsimon (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5627 From thartmann at openjdk.java.net Wed Sep 22 09:40:58 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 22 Sep 2021 09:40:58 GMT Subject: RFR: 8274120: [JVMCI] CompileBroker should resolve parameter types for JVMCI compiles In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 06:26:42 GMT, Tom Rodriguez wrote: > 8274120: [JVMCI] CompileBroker should resolve parameter types for JVMCI compiles Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5627 From shade at openjdk.java.net Wed Sep 22 10:21:43 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 22 Sep 2021 10:21:43 GMT Subject: RFR: 8274130: C2: MulNode::Ideal chained transformations may act on wrong nodes Message-ID: I was puzzled by it when fixing JDK-8274060. It looks that new optimizations added by [JDK-8273454](https://bugs.openjdk.java.net/browse/JDK-8273454) and [JDK-8263006](https://bugs.openjdk.java.net/browse/JDK-8263006) rewire `in(1)` and `in(2)` in `MulNode::Ideal`, which means the chained transformations should see them? Yet, both inputs and their `Type`-s are cached locally and not refreshed. I have not seen failures due to this yet, but it looks that the current code is subtly incorrect because of this. I thought about doing `return this` instead of `progress = true`, so that we leave `MulNode::Ideal` once we hit any transform and hope to return back, but I wondered if that would expose us to different graph shapes in-between successive `MulNode::Ideal` calls, which might have other unintended consequences. Therefore, I opted to a more conservative patch. Additional testing: - [x] `compiler/` tests - [ ] `tier1` tests - [ ] Fuzzer tests ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/5631/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5631&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274130 Stats: 11 lines in 1 file changed: 7 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5631.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5631/head:pull/5631 PR: https://git.openjdk.java.net/jdk/pull/5631 From adinn at openjdk.java.net Wed Sep 22 11:22:57 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Wed, 22 Sep 2021 11:22:57 GMT Subject: RFR: 8274130: C2: MulNode::Ideal chained transformations may act on wrong nodes In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 10:12:16 GMT, Aleksey Shipilev wrote: > I was puzzled by it when fixing JDK-8274060. It looks that new optimizations added by [JDK-8273454](https://bugs.openjdk.java.net/browse/JDK-8273454) and [JDK-8263006](https://bugs.openjdk.java.net/browse/JDK-8263006) rewire `in(1)` and `in(2)` in `MulNode::Ideal`, which means the chained transformations should see them? Yet, both inputs and their `Type`-s are cached locally and not refreshed. I have not seen failures due to this yet, but it looks that the current code is subtly incorrect because of this. > > I thought about doing `return this` instead of `progress = true`, so that we leave `MulNode::Ideal` once we hit any transform and hope to return back, but I wondered if that would expose us to different graph shapes in-between successive `MulNode::Ideal` calls, which might have other unintended consequences. Therefore, I opted to a more conservative patch. > > Additional testing: > - [x] `compiler/` tests > - [x] `tier1` tests > - [ ] Fuzzer tests This is actually cleaner but I'm not sure the change is strictly needed. In these specific transforms I think the types of the operands and the operation ought never to differ. e.g for the multuply rule we transform (MulF (MinusF f2) (MinusF f2)) ==> (MulF f1 f2. The types of the MinusF terms input to the MulF both have to be float. So, do the types of the inputs f1 and f2. We should never get an input graph that has, say, a float for one arg and a double for another. ------------- PR: https://git.openjdk.java.net/jdk/pull/5631 From shade at openjdk.java.net Wed Sep 22 11:26:57 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 22 Sep 2021 11:26:57 GMT Subject: RFR: 8274130: C2: MulNode::Ideal chained transformations may act on wrong nodes In-Reply-To: References: Message-ID: <5KkjcA-AO-wUVSe2uijBohSjdEai6EgzCmtH68BZAlk=.91573743-154d-445d-a39e-e7baa8feb0c8@github.com> On Wed, 22 Sep 2021 11:19:45 GMT, Andrew Dinn wrote: > This is actually cleaner but I'm not sure the change is strictly needed. In these specific transforms I think the types of the operands and the operation ought never to differ. For types and current transforms, that might be true. It might not hold true in future. The patch, however, also makes sure these lines refer to the most actual nodes: igvn->_worklist.push(in1); igvn->_worklist.push(in2); ------------- PR: https://git.openjdk.java.net/jdk/pull/5631 From shade at openjdk.java.net Wed Sep 22 11:30:02 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 22 Sep 2021 11:30:02 GMT Subject: RFR: 8273380: ARM32: Default to {ldrexd, strexd} in StubRoutines::atomic_{load|store}_long In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 11:28:58 GMT, Aleksey Shipilev wrote: > Current ARM32 is one of few remaining uses of `os::is_MP`, the rest is removed by JDK-8188764. There are some interesting bugs in OS/libc that might give incorrect `os::is_MP` sometimes, e.g. in containers. Instead of risking it, we can default to {ldrexd,strexd} for ARMv7 (which always have them), and leave the `os::is_MP` path for ARMv6 (for which this is the only remaining way to load the 64-bit long). > > @mychris, you might want to take a look and do light performance testing for it? > > Additional testing: > - [x] Linux ARM32 cross-compiled build completes I guess there are no takers for ARM code :) ------------- PR: https://git.openjdk.java.net/jdk/pull/5379 From phedlin at openjdk.java.net Wed Sep 22 11:42:34 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Wed, 22 Sep 2021 11:42:34 GMT Subject: RFR: 8274039: codestrings gtest fails when hsdis is present [v3] In-Reply-To: References: Message-ID: > Please review this change to the (g)test-case "codestrings". > > Adding missing regexp pattern to remove trailing **hsdis** printout due to padding on x64. Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: Update test_codestrings.cpp Alternative pattern. (Just trying the "in review" editing work-flow.) ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5606/files - new: https://git.openjdk.java.net/jdk/pull/5606/files/597e779b..83539c99 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5606&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5606&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5606.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5606/head:pull/5606 PR: https://git.openjdk.java.net/jdk/pull/5606 From shade at openjdk.java.net Wed Sep 22 13:30:57 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 22 Sep 2021 13:30:57 GMT Subject: RFR: 8274039: codestrings gtest fails when hsdis is present [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 16:28:10 GMT, Patric Hedlin wrote: > Just in order not to eat the last "\n" before the padded output (newlines are consumed at the end of the pattern). > If you think consuming the newlines in the prefix is better (more obvious or just in line with the aarch64 pattern) we can use: > > `std::regex("\\s+:\\s+hlt[ \\t]+(?!\\n\\s+;;)")` This works on my machine too. But honestly, I don't care. Just pick one of these variants and unbreak the `tier1` :) ------------- PR: https://git.openjdk.java.net/jdk/pull/5606 From rrich at openjdk.java.net Wed Sep 22 14:03:55 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Wed, 22 Sep 2021 14:03:55 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. Hi Volker, > Fortunately, I think the solution is pretty simple. I don't think we need the backtrace at all. In the end it is just an optimization to save some space and not construct the full StackTraceElement[] right at the creation time of an exception. But the implicit exceptions which we are creating here are "nmethod-singletons" and as such I don't think we loose much if we create the array of StackTraceElements right away instead of creating a backtrace (see my last push). The StackTraceElements only contain Strings and therefore don't keep any classes unnecessarily alive. What do you think? I agree. Thanks for fixing! > And once you're on it, would you mind reviewing the whole PR :) :) I'll be out of office next week. Maybe I'll get to review the related https://github.com/openjdk/jdk/pull/5488 after that if needed. Cheers, Richard. ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From phedlin at openjdk.java.net Wed Sep 22 14:23:59 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Wed, 22 Sep 2021 14:23:59 GMT Subject: RFR: 8274039: codestrings gtest fails when hsdis is present [v3] In-Reply-To: References: Message-ID: <8ehLN-CqZHabSAZ_vOY-yXSkKZmgGovd5JaLT0hiDlE=.cd2e4910-c2e8-4bbb-b3a6-4c0b04b77218@github.com> On Wed, 22 Sep 2021 11:42:34 GMT, Patric Hedlin wrote: >> Please review this change to the (g)test-case "codestrings". >> >> Adding missing regexp pattern to remove trailing **hsdis** printout due to padding on x64. > > Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: > > Update test_codestrings.cpp > > Alternative pattern. > > (Just trying the "in review" editing work-flow.) Thanks for reviewing. ------------- PR: https://git.openjdk.java.net/jdk/pull/5606 From phedlin at openjdk.java.net Wed Sep 22 14:23:59 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Wed, 22 Sep 2021 14:23:59 GMT Subject: Integrated: 8274039: codestrings gtest fails when hsdis is present In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 11:29:49 GMT, Patric Hedlin wrote: > Please review this change to the (g)test-case "codestrings". > > Adding missing regexp pattern to remove trailing **hsdis** printout due to padding on x64. This pull request has now been integrated. Changeset: c9de8063 Author: Patric Hedlin URL: https://git.openjdk.java.net/jdk/commit/c9de80635e25badbb5410e22b6619379598a9a57 Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod 8274039: codestrings gtest fails when hsdis is present Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/5606 From github.com+7535718+rsmogura at openjdk.java.net Wed Sep 22 16:06:07 2021 From: github.com+7535718+rsmogura at openjdk.java.net (Radoslaw Smogura) Date: Wed, 22 Sep 2021 16:06:07 GMT Subject: RFR: 8259609: C2: optimize long range checks in long counted loops [v6] In-Reply-To: <-d_SITgIi5SDUnvKHBFsuBPs05QNn8FxhS35iIIZTNg=.e38ea6bc-e07d-425d-8a7a-c550380c1f13@github.com> References: <-d_SITgIi5SDUnvKHBFsuBPs05QNn8FxhS35iIIZTNg=.e38ea6bc-e07d-425d-8a7a-c550380c1f13@github.com> Message-ID: On Mon, 23 Aug 2021 15:35:53 GMT, Roland Westrelin wrote: >> JDK-8255150 makes it possible for java code to explicitly perform a >> range check on long values. JDK-8223051 provides a transformation of >> long counted loops into loop nests with an inner int counted >> loop. With this change I propose transforming long range checks that >> operate on the iv of a long counted loop into range checks that >> operate on the iv of the int inner loop once it has been >> created. Existing range check eliminations can then kick in. >> >> Transformation of range checks is piggy backed on the loop nest >> creation for 2 reasons: >> >> - pattern matching range checks is easier right before the loop nest >> is created >> >> - the number of iterations of the inner loop is adjusted so scale * >> inner_iv doesn't overflow >> >> C2 has logic to delay some split if transformations so they don't >> break the scale * iv + offset pattern. I reused that logic for long >> range checks and had to relax what's considered a range check because >> initially a range check from Object.checkIndex() may include a test >> for range > 0 that needs a round of loop opts to be hoisted. I realize >> there's some code duplication but I didn't see a way to share logic >> between IdealLoopTree::may_have_range_check() >> IdealLoopTree::policy_range_check() that would feel right. >> >> I realize the comment in PhaseIdealLoop::transform_long_range_checks() >> is scary. FWIW, it's not as complicated as it looks. I found drawing >> the range covered by the entire long loop and the range covered by the >> inner loop help see how range checks can be transformed. Then the >> comment helps make sure all cases are covered and verify the generated >> code actually covers all of them. >> >> One issue is overflow. I think the fact that inner_iv * scale doesn't >> overflow helps simplify thing. One possible overflow is that of scale >> * upper + offset which is handled by forcing all range checks in that >> case to deoptimize. I don't think other case of overflow needs special >> handling. >> >> This was tested with a Memory Segment micro benchmark (and patched >> Memory Segment support to take advantage of the new checkIndex >> intrinsic, both provided by Maurizio). Range checks in the micro >> benchmark are properly optimized (and performance increases >> significantly). > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespace More or less I thought with constrain `dist(lower_bound , upper_bound) < Long.MAX` it should not create a case that we will read pass the range and preconditions will be satisfied (if two bounds will overflow long and be in range, and because scale & offset are loop invariant it means it's by developer's design). In any way I didn't want to interfere, and I hope it will be finished soon. ------------- PR: https://git.openjdk.java.net/jdk/pull/2045 From never at openjdk.java.net Wed Sep 22 17:21:01 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Wed, 22 Sep 2021 17:21:01 GMT Subject: RFR: 8274120: [JVMCI] CompileBroker should resolve parameter types for JVMCI compiles In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 06:26:42 GMT, Tom Rodriguez wrote: > 8274120: [JVMCI] CompileBroker should resolve parameter types for JVMCI compiles Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/5627 From never at openjdk.java.net Wed Sep 22 17:21:02 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Wed, 22 Sep 2021 17:21:02 GMT Subject: Integrated: 8274120: [JVMCI] CompileBroker should resolve parameter types for JVMCI compiles In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 06:26:42 GMT, Tom Rodriguez wrote: > 8274120: [JVMCI] CompileBroker should resolve parameter types for JVMCI compiles This pull request has now been integrated. Changeset: 57fe11c9 Author: Tom Rodriguez URL: https://git.openjdk.java.net/jdk/commit/57fe11c9a37c121a244b3d6c9c0a1413dc0484b7 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8274120: [JVMCI] CompileBroker should resolve parameter types for JVMCI compiles Reviewed-by: dnsimon, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5627 From jrose at openjdk.java.net Wed Sep 22 21:18:55 2021 From: jrose at openjdk.java.net (John R Rose) Date: Wed, 22 Sep 2021 21:18:55 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: <3iOUpNM2PQuE_i52zjBx72fHyXWZdmkq-a0hAA7Omlc=.0ac26dff-990c-4604-8f53-da449f0e3d34@github.com> On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. To me this looks like a very clever mess. The mess comes from the trickiness (it's a tricky problem!) and even more from forcing various parts of the system that are usually isolated to come into contact. Adding reasons to GC during a JIT task is a smell. Adding objects which are pieced together at compile time is a smell. (And you can't run Java code from the JIT; it's an architectural limitation.) Having JavaClasses talk directly to C2 GraphKit (without even a CI class between) is a smell. Adding a new section to nmethods just to make a poorly-understood life cycle for an odd (non-Java-created) group of exceptions is a smell. If we need a new section on nmethods it should be something more like "Java structures the JIT has made", with clearly separated concerns from the rest of the system, rather than "my very special section for an intrusive RFE". This is not even close to being ready to integrate. ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From dlong at openjdk.java.net Wed Sep 22 23:00:55 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 22 Sep 2021 23:00:55 GMT Subject: RFR: 8273380: ARM32: Default to {ldrexd, strexd} in StubRoutines::atomic_{load|store}_long In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 11:28:58 GMT, Aleksey Shipilev wrote: > Current ARM32 is one of few remaining uses of `os::is_MP`, the rest is removed by JDK-8188764. There are some interesting bugs in OS/libc that might give incorrect `os::is_MP` sometimes, e.g. in containers. Instead of risking it, we can default to {ldrexd,strexd} for ARMv7 (which always have them), and leave the `os::is_MP` path for ARMv6 (for which this is the only remaining way to load the 64-bit long). > > @mychris, you might want to take a look and do light performance testing for it? > > Additional testing: > - [x] Linux ARM32 cross-compiled build completes Seems good to me. Maybe another recent arm32 contributor could also look at this? @mychris @bulasevich @dsamersoff @TheRealMDoerr ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5379 From njian at openjdk.java.net Thu Sep 23 03:03:00 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Thu, 23 Sep 2021 03:03:00 GMT Subject: Integrated: 8267356: AArch64: Vector API SVE codegen support In-Reply-To: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Thu, 20 May 2021 07:32:52 GMT, Ningsheng Jian wrote: > This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. This pull request has now been integrated. Changeset: 9d3379b9 Author: Ningsheng Jian URL: https://git.openjdk.java.net/jdk/commit/9d3379b9755e9739f0b8f5c29deb1d28d0f3aa81 Stats: 5761 lines in 13 files changed: 4576 ins; 195 del; 990 mod 8267356: AArch64: Vector API SVE codegen support Co-authored-by: Xiaohong Gong Co-authored-by: Wang Huang Co-authored-by: Ningsheng Jian Co-authored-by: Xuejin He Co-authored-by: Ai Jiaming Co-authored-by: Eric Liu Reviewed-by: aph, ngasson ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From dlong at openjdk.java.net Thu Sep 23 03:28:57 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 23 Sep 2021 03:28:57 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: <7v19NZqVzDlIpGm_JEGFW9ynn-7fz2xZ2kIkI2lelL8=.7eef3216-9700-4dc6-b267-5d3f7f172f16@github.com> On Thu, 16 Sep 2021 17:00:20 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Minor updates as requested by @TheRealMDoerr How about introduce a public rangeCheck() method that returns true/false and would be a compiler intrinsic. Then we don't have to create an exception at all. It could go some place like jdk/internal/util/ArraysSupport. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From dlong at openjdk.java.net Thu Sep 23 03:50:54 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 23 Sep 2021 03:50:54 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 17:00:20 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Minor updates as requested by @TheRealMDoerr Ok 2nd thought, change code like isAlpha() to do the range check online. It would be nice if the compiler could do that automatically, but I don?t think the spec would allow omitting the except, even though it would be difficult to tell without exception logging or JVMTI turned on. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From bulasevich at openjdk.java.net Thu Sep 23 06:30:57 2021 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Thu, 23 Sep 2021 06:30:57 GMT Subject: RFR: 8273380: ARM32: Default to {ldrexd, strexd} in StubRoutines::atomic_{load|store}_long In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 11:28:58 GMT, Aleksey Shipilev wrote: > Current ARM32 is one of few remaining uses of `os::is_MP`, the rest is removed by JDK-8188764. There are some interesting bugs in OS/libc that might give incorrect `os::is_MP` sometimes, e.g. in containers. Instead of risking it, we can default to {ldrexd,strexd} for ARMv7 (which always have them), and leave the `os::is_MP` path for ARMv6 (for which this is the only remaining way to load the 64-bit long). > > @mychris, you might want to take a look and do light performance testing for it? > > Additional testing: > - [x] Linux ARM32 cross-compiled build completes Good change! thank you ------------- PR: https://git.openjdk.java.net/jdk/pull/5379 From github.com+741251+turbanoff at openjdk.java.net Thu Sep 23 07:51:05 2021 From: github.com+741251+turbanoff at openjdk.java.net (Andrey Turbanov) Date: Thu, 23 Sep 2021 07:51:05 GMT Subject: Integrated: 8274048 IGV: Replace usages of Collections.sort with List.sort call In-Reply-To: <8KIga70tt6TEH9vMuxl8OuDQWDOX3qCIeMeIC8OsfdU=.22a04314-43c6-48b0-b87f-224d2fd8060b@github.com> References: <8KIga70tt6TEH9vMuxl8OuDQWDOX3qCIeMeIC8OsfdU=.22a04314-43c6-48b0-b87f-224d2fd8060b@github.com> Message-ID: <6clCPGz3RnOAOCRWTSM2e5FS28k74vKLvmg0IwT6ing=.d8a496b2-121f-4d5c-b2aa-12e8f30eb8f1@github.com> On Mon, 23 Aug 2021 20:52:22 GMT, Andrey Turbanov wrote: > Collections.sort is just a wrapper, so it is better to use an instance method directly. This pull request has now been integrated. Changeset: 8b833bbe Author: Andrey Turbanov Committer: Christian Hagedorn URL: https://git.openjdk.java.net/jdk/commit/8b833bbea84829664f23d17c7f94c0379b48f365 Stats: 24 lines in 8 files changed: 0 ins; 2 del; 22 mod 8274048: IGV: Replace usages of Collections.sort with List.sort call Reviewed-by: chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/5228 From dsamersoff at openjdk.java.net Thu Sep 23 10:46:50 2021 From: dsamersoff at openjdk.java.net (Dmitry Samersoff) Date: Thu, 23 Sep 2021 10:46:50 GMT Subject: RFR: 8273380: ARM32: Default to {ldrexd, strexd} in StubRoutines::atomic_{load|store}_long In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 11:28:58 GMT, Aleksey Shipilev wrote: > Current ARM32 is one of few remaining uses of `os::is_MP`, the rest is removed by JDK-8188764. There are some interesting bugs in OS/libc that might give incorrect `os::is_MP` sometimes, e.g. in containers. Instead of risking it, we can default to {ldrexd,strexd} for ARMv7 (which always have them), and leave the `os::is_MP` path for ARMv6 (for which this is the only remaining way to load the 64-bit long). > > @mychris, you might want to take a look and do light performance testing for it? > > Additional testing: > - [x] Linux ARM32 cross-compiled build completes src/hotspot/cpu/arm/stubGenerator_arm.cpp line 641: > 639: __ ldrexd(result_lo, Address(src)); > 640: __ clrex(); // FIXME: safe to remove? > 641: __ bx(LR); bx(LR) is common for all 3 branches, so it might be better to move it out ------------- PR: https://git.openjdk.java.net/jdk/pull/5379 From simonis at openjdk.java.net Thu Sep 23 11:16:56 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 23 Sep 2021 11:16:56 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 17:00:20 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Minor updates as requested by @TheRealMDoerr Hi Dean, thank you for looking at this change. Unfortunately I don't completely understand your point. Are you saying we should change all user code which leads to "hot" exceptions such that they don't throw exceptions any more? That would certainly be a possibility, but I think it is neither practical nor customer obsessed. With that argument you wouldn't need the `-XX:+OmitStackTraceInFastThrow` optimization (which is turned on by default) in the first place :) I think it is a quite simple and pragmatic solution to also optimize implicit exceptions for users who run with `-XX:-OmitStackTraceInFastThrow` because they require full stack traces. Best regards, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From shade at openjdk.java.net Thu Sep 23 11:17:36 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 23 Sep 2021 11:17:36 GMT Subject: RFR: 8273380: ARM32: Default to {ldrexd, strexd} in StubRoutines::atomic_{load|store}_long [v2] In-Reply-To: References: Message-ID: <4HloD4y6X-zkMlSoxw7h7KG2_cwaYM13c44_X1n69rA=.20782f5e-1826-40cc-a8fa-9bfaac389e89@github.com> > Current ARM32 is one of few remaining uses of `os::is_MP`, the rest is removed by JDK-8188764. There are some interesting bugs in OS/libc that might give incorrect `os::is_MP` sometimes, e.g. in containers. Instead of risking it, we can default to {ldrexd,strexd} for ARMv7 (which always have them), and leave the `os::is_MP` path for ARMv6 (for which this is the only remaining way to load the 64-bit long). > > @mychris, you might want to take a look and do light performance testing for it? > > Additional testing: > - [x] Linux ARM32 cross-compiled build completes Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Move out bx - Merge branch 'master' into JDK-8273380-arm32-default-to-ldrex - First attempt ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5379/files - new: https://git.openjdk.java.net/jdk/pull/5379/files/24b814c8..ec04c50d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5379&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5379&range=00-01 Stats: 29777 lines in 1164 files changed: 19695 ins; 5627 del; 4455 mod Patch: https://git.openjdk.java.net/jdk/pull/5379.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5379/head:pull/5379 PR: https://git.openjdk.java.net/jdk/pull/5379 From shade at openjdk.java.net Thu Sep 23 11:17:41 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 23 Sep 2021 11:17:41 GMT Subject: RFR: 8273380: ARM32: Default to {ldrexd, strexd} in StubRoutines::atomic_{load|store}_long [v2] In-Reply-To: References: Message-ID: <_6NDp0jbAbmdkZ_7twnRYTMuyI8Sg51lL9OAv16M3cU=.b4d2b0e0-e5f8-4f60-98ae-d613331a3fe0@github.com> On Thu, 23 Sep 2021 10:43:41 GMT, Dmitry Samersoff wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Move out bx >> - Merge branch 'master' into JDK-8273380-arm32-default-to-ldrex >> - First attempt > > src/hotspot/cpu/arm/stubGenerator_arm.cpp line 641: > >> 639: __ ldrexd(result_lo, Address(src)); >> 640: __ clrex(); // FIXME: safe to remove? >> 641: __ bx(LR); > > bx(LR) is common for all 3 branches, so it might be better to move it out It seems that `bx(LR)` near `stop` is redundant. But I don't see why not move it outside the block. See new commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/5379 From github.com+71546117+tobiasholenstein at openjdk.java.net Thu Sep 23 13:00:03 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Thu, 23 Sep 2021 13:00:03 GMT Subject: Integrated: JDK-8272703: StressSeed should be set via FLAG_SET_ERGO In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 08:38:57 GMT, Tobias Holenstein wrote: > Set the `StressSeed` (added by JDK-8252219) via `FLAG_SET_ERGO` if the seed is generated by the VM (i.e., not set via the command line). This way, `StressSeed` will be added to the "[Global flags]" section of the hs_err file on crash and can be used to reproduce the issue. > > If `RepeatCompilation` is on and no `StressSeed` is set, a new `StressSeed` is generated for every compilation. The reason for this is that some Bugs are only triggered intermittent, because they depend on the `StressSeed`. This pull request has now been integrated. Changeset: 66ce09f5 Author: Tobias Holenstein Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/66ce09f51eb37029c8ba67a70f8c90a307dae1eb Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod 8272703: StressSeed should be set via FLAG_SET_ERGO Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5599 From github.com+71546117+tobiasholenstein at openjdk.java.net Thu Sep 23 13:03:00 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Thu, 23 Sep 2021 13:03:00 GMT Subject: Integrated: JDK-8270156: Add "randomness" and "stress" keys to JTreg tests which use StressGCM, StressLCM and/or StressIGVN In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 14:17:28 GMT, Tobias Holenstein wrote: > Added the keys "randomness" and "stress" to the JTreg tests which use StressGCM, StressLCM and/or StressIGVN and did not use the keys before. This pull request has now been integrated. Changeset: 653a612a Author: Tobias Holenstein Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/653a612a5aac266509f414c570871b7141b9347d Stats: 10 lines in 8 files changed: 8 ins; 1 del; 1 mod 8270156: Add "randomness" and "stress" keys to JTreg tests which use StressGCM, StressLCM and/or StressIGVN Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.java.net/jdk/pull/5614 From chagedorn at openjdk.java.net Thu Sep 23 13:11:28 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 23 Sep 2021 13:11:28 GMT Subject: RFR: 8274074: SIGFPE with C2 compiled code with -XX:+StressGCM Message-ID: In the testcase, the divisor input node of a `DivI` node is sunk out of a loop to a div by zero UCT and is pinned with a `CastII` node to ensure it's not floating back into the loop. The divisor is optimized by taking into account that it's only executed on the uncommon path. The `CastII`, however, is removed later and the division floats back into the loop which results in a SIGFPE crash. The relevent lines in the testcase are the following two divisions: static int iFld = 1; static int q = 0; ... y = iFld - q; // divisor y = (iArrFld[2] / y); // division 1 y = (5 / iFld); // division 2 After sinking the `divisor` of `division 1` in the testcase to the div by zero UCT of `division 2`, the graph looks like this: ![Screenshot from 2021-09-21 14-40-37](https://user-images.githubusercontent.com/17833009/134506632-9904da7b-8210-4301-85dc-04441324fe55.png) - `201 If` is the zero check of `division 2` (will always succeed because `iFld = 1`, i.e. UCT is never taken). - `193 DivI` (`division 1`) is not sunk because its `get_ctrl()` is `203 IfFalse` (outside the loop already because there is no use inside the loop since the local `y` is directly overwritten again). - `275 SubI` (`divisor`) was sunk out of the loop and is pinned by `276 CastII` (unconditional dependency). In IGVN, `CastII::Value()` is called for `276 CastII`. It sees an `If/Cmp` (zero check of `division 2`) with the same `137 LoadI` input as for the `276 CastII`. Therefore, we set its type to [0,0] here: https://github.com/openjdk/jdk/blob/d0987513665def1b6b2981ab5932b6f1b8b310d8/src/hotspot/share/opto/castnode.cpp#L248-L252 As a result, we replace `276 CastII` with a constant zero in IGVN. But now we lost the pin to the uncommon path of the zero check of `division 2` for `275 SubI` and `193 DivI`. `193 DivI` is only used on the uncommon path but can now float around again, also inside the loop itself, which happens in the testcase. Inside the loop, we execute the division with the now optimized divisor `0 - q = 0` which is a division by zero and we crash. In summary, it's not a problem that a `Div` node floats above its zero check here but rather that we optimize an input node used as divisor by assuming that we only execute the division on the uncommon path when the zero check of `division 2` failed (which never happens). This divisor optimization would be wrong when the division is executed inside the loop. But due to losing the pin, we end up doing exactly that which results in a SIGFPE crash. The suggested fix is to extend the sinking algorithm to rewire data nodes with a control input inside a loop whose `get_ctrl()` is actually completely outside loops on uncommon paths. The control input is set to `get_ctrl()` to force the nodes out of loops. In the example above, the control input of `193 DivI` is set to `203 IfFalse`, ensuring that it is still pinned to the uncommon path after `276 CastII` is removed. This fix is also beneficial if we do not sink any nodes at all later. Thanks, Christian ------------- Commit messages: - comment - 8274074: SIGFPE with C2 compiled code with -XX:+StressGCM Changes: https://git.openjdk.java.net/jdk/pull/5651/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5651&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274074 Stats: 90 lines in 2 files changed: 90 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5651.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5651/head:pull/5651 PR: https://git.openjdk.java.net/jdk/pull/5651 From chagedorn at openjdk.java.net Thu Sep 23 13:35:24 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Thu, 23 Sep 2021 13:35:24 GMT Subject: RFR: 8271459: C2: Missing NegativeArraySizeException when creating StringBuilder with negative capacity Message-ID: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> Stringopts does not take into account that a negative `int` argument for `StringBuilder(int)` results in a `NegativeArraySizeException` when optimizing away `StringBuilder` usages into single strings. The suggested fix does the following: - Bailout of Stringopts if C2 knows that an `int` argument is always negative. - Apply stringopts but insert an additional runtime check with an UCT if C2 cannot tell if an `int` argument is positive or negative. I added some IR tests to verify the fix and also ran some standard benchmarks. I also updated `TestIRMatching` to test the new and updated default regexes. Thanks, Christian ------------- Commit messages: - 8271459: C2: Missing NegativeArraySizeException when creating StringBuilder with negative capacity Changes: https://git.openjdk.java.net/jdk/pull/5652/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5652&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271459 Stats: 276 lines in 4 files changed: 269 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/5652.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5652/head:pull/5652 PR: https://git.openjdk.java.net/jdk/pull/5652 From roland at openjdk.java.net Thu Sep 23 15:57:58 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 23 Sep 2021 15:57:58 GMT Subject: RFR: 8274074: SIGFPE with C2 compiled code with -XX:+StressGCM In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 13:00:22 GMT, Christian Hagedorn wrote: > In the testcase, the divisor input node of a `DivI` node is sunk out of a loop to a div by zero UCT and is pinned with a `CastII` node to ensure it's not floating back into the loop. The divisor is optimized by taking into account that it's only executed on the uncommon path. The `CastII`, however, is removed later and the division floats back into the loop which results in a SIGFPE crash. > > The relevent lines in the testcase are the following two divisions: > > static int iFld = 1; > static int q = 0; > ... > y = iFld - q; // divisor > y = (iArrFld[2] / y); // division 1 > y = (5 / iFld); // division 2 > > After sinking the `divisor` of `division 1` in the testcase to the div by zero UCT of `division 2`, the graph looks like this: > > ![Screenshot from 2021-09-21 14-40-37](https://user-images.githubusercontent.com/17833009/134506632-9904da7b-8210-4301-85dc-04441324fe55.png) > > - `201 If` is the zero check of `division 2` (will always succeed because `iFld = 1`, i.e. UCT is never taken). > - `193 DivI` (`division 1`) is not sunk because its `get_ctrl()` is `203 IfFalse` (outside the loop already because there is no use inside the loop since the local `y` is directly overwritten again). > - `275 SubI` (`divisor`) was sunk out of the loop and is pinned by `276 CastII` (unconditional dependency). > > In IGVN, `CastII::Value()` is called for `276 CastII`. It sees an `If/Cmp` (zero check of `division 2`) with the same `137 LoadI` input as for the `276 CastII`. Therefore, we set its type to [0,0] here: > https://github.com/openjdk/jdk/blob/d0987513665def1b6b2981ab5932b6f1b8b310d8/src/hotspot/share/opto/castnode.cpp#L248-L252 > > As a result, we replace `276 CastII` with a constant zero in IGVN. But now we lost the pin to the uncommon path of the zero check of `division 2` for `275 SubI` and `193 DivI`. `193 DivI` is only used on the uncommon path but can now float around again, also inside the loop itself, which happens in the testcase. Inside the loop, we execute the division with the now optimized divisor `0 - q = 0` which is a division by zero and we crash. > > In summary, it's not a problem that a `Div` node floats above its zero check here but rather that we optimize an input node used as divisor by assuming that we only execute the division on the uncommon path when the zero check of `division 2` failed (which never happens). This divisor optimization would be wrong when the division is executed inside the loop. But due to losing the pin, we end up doing exactly that which results in a SIGFPE crash. > > The suggested fix is to extend the sinking algorithm to rewire data nodes with a control input inside a loop whose `get_ctrl()` is actually completely outside loops on uncommon paths. The control input is set to `get_ctrl()` to force the nodes out of loops. In the example above, the control input of `193 DivI` is set to `203 IfFalse`, ensuring that it is still pinned to the uncommon path after `276 CastII` is removed. This fix is also beneficial if we do not sink any nodes at all later. > > Thanks, > Christian Changes requested by roland (Reviewer). src/hotspot/share/opto/loopopts.cpp line 1561: > 1559: } > 1560: _dom_lca_tags_round = 0; > 1561: } else if (n_loop == _ltree_root && n->in(0) != NULL && get_loop(n->in(0)) != _ltree_root) { Couldn't the node be out of this loop but not necessarily out of all loops? ------------- PR: https://git.openjdk.java.net/jdk/pull/5651 From shade at openjdk.java.net Thu Sep 23 15:58:00 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 23 Sep 2021 15:58:00 GMT Subject: RFR: 8273380: ARM32: Default to {ldrexd, strexd} in StubRoutines::atomic_{load|store}_long [v2] In-Reply-To: References: Message-ID: <7wnio9k4gJNLVLl7JFji9TamqAxDR_t0kpmMYSrRNwc=.f54e1a82-151d-4a89-a3d9-7529130d865c@github.com> On Thu, 23 Sep 2021 10:43:41 GMT, Dmitry Samersoff wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Move out bx >> - Merge branch 'master' into JDK-8273380-arm32-default-to-ldrex >> - First attempt > > src/hotspot/cpu/arm/stubGenerator_arm.cpp line 641: > >> 639: __ ldrexd(result_lo, Address(src)); >> 640: __ clrex(); // FIXME: safe to remove? >> 641: __ bx(LR); > > bx(LR) is common for all 3 branches, so it might be better to move it out @dsamersoff, are you happy with new version? ------------- PR: https://git.openjdk.java.net/jdk/pull/5379 From roland at openjdk.java.net Thu Sep 23 16:07:50 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Thu, 23 Sep 2021 16:07:50 GMT Subject: RFR: 8271459: C2: Missing NegativeArraySizeException when creating StringBuilder with negative capacity In-Reply-To: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> References: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> Message-ID: On Thu, 23 Sep 2021 13:25:51 GMT, Christian Hagedorn wrote: > Stringopts does not take into account that a negative `int` argument for `StringBuilder(int)` results in a `NegativeArraySizeException` when optimizing away `StringBuilder` usages into single strings. > > The suggested fix does the following: > - Bailout of Stringopts if C2 knows that an `int` argument is always negative. > - Apply stringopts but insert an additional runtime check with an UCT if C2 cannot tell if an `int` argument is positive or negative. > > I added some IR tests to verify the fix and also ran some standard benchmarks. > > I also updated `TestIRMatching` to test the new and updated default regexes. > > Thanks, > Christian Looks reasonable to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5652 From simonis at openjdk.java.net Thu Sep 23 16:34:53 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 23 Sep 2021 16:34:53 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: <4vVjDnZ2_joqYIMVwTm2p5ZIecp1i4t4o6_3o8BCtgY=.27940f77-156c-4e36-99a3-0a57ebb0914e@github.com> On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. Hi Jon, thanks for looking at this PR. Let me reply to your comments inline: > To me this looks like a very clever mess. The mess comes from the trickiness (it's a tricky problem!) and even more from forcing various parts of the system that are usually isolated to come into contact. Adding reasons to GC during a JIT task is a smell. Adding objects which are pieced together at compile time is a smell. (And you can't run Java code from the JIT; it's an architectural limitation.) I agree. But we already have all this mess today with the current implementation of `-XX:+OmitStackTraceInFastThrow` which already lazily creates empty exceptions and introduces all the problems you describe (see `ciEnv::get_or_create_exception()`). It's to a much lesser extent compared to this change, but fundamentally it is not different. > Having JavaClasses talk directly to C2 GraphKit (without even a CI class between) is a smell. Adding a new section to nmethods just to make a poorly-understood life cycle for an odd (non-Java-created) group of exceptions is a smell. > > If we need a new section on nmethods it should be something more like "Java structures the JIT has made", with clearly separated concerns from the rest of the system, rather than "my very special section for an intrusive RFE". > That's a good proposal and I'm happy to work on such a solution. What do you mean exactly by ".._clearly separated concerns from the rest of the system_.."? > This is not even close to being ready to integrate. As I said, I'm happy to invest more work and improve this PR based on your suggestion if there's a chance for this feature to be accepted (even if only in a heavily revised form). But in general I think the **biggest mess** is really that users still get empty exceptions without any information at all and I think it is time to fix that. Unfortunately I can't see into the history of this code before jdk 6, but from [JDK-4292742: NullPointerException with no stack trace](https://bugs.openjdk.java.net/browse/JDK-4292742) it looks like you already worked on this issue almost 20 years ago :) So what about removing JavaClasses' dependency on GraphKit and making the new nmethod section more generally usable as you suggested? Are there any other pain points before reconsidering this PR? Any other suggestions you like me to integrate? Thank you and best regards, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From dlong at openjdk.java.net Thu Sep 23 22:55:54 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 23 Sep 2021 22:55:54 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 17:00:20 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Minor updates as requested by @TheRealMDoerr OK, so user/customer wants or needs to run with -XX:-OmitStackTraceInFastThrow, and there is code like isAlpha() throwing a hot exception. Does the user really care about the stack trace and -XX:-OmitStackTraceInFastThrow setting for this method? If the compiler could eliminate the stack trace for this and similar methods, or even better, eliminate the exception too, like it does for other allocations through escape analysis, would that solve your use cases? Or are there examples where the hot exception escapes and we really need to create it with a stack trace and throw it? I guess the amount of effort the JVM does to support "hot exceptions" (which seems like an oxymoron to me), surprises me, so the thought off adding even more complexity concerns me. But I'm not an expert on this part of the code, so let's see what other JIT experts think. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From chagedorn at openjdk.java.net Fri Sep 24 08:22:51 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 24 Sep 2021 08:22:51 GMT Subject: RFR: 8271459: C2: Missing NegativeArraySizeException when creating StringBuilder with negative capacity In-Reply-To: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> References: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> Message-ID: On Thu, 23 Sep 2021 13:25:51 GMT, Christian Hagedorn wrote: > Stringopts does not take into account that a negative `int` argument for `StringBuilder(int)` results in a `NegativeArraySizeException` when optimizing away `StringBuilder` usages into single strings. > > The suggested fix does the following: > - Bailout of Stringopts if C2 knows that an `int` argument is always negative. > - Apply stringopts but insert an additional runtime check with an UCT if C2 cannot tell if an `int` argument is positive or negative. > > I added some IR tests to verify the fix and also ran some standard benchmarks. > > I also updated `TestIRMatching` to test the new and updated default regexes. > > Thanks, > Christian Thanks Roland for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5652 From chagedorn at openjdk.java.net Fri Sep 24 09:46:21 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 24 Sep 2021 09:46:21 GMT Subject: RFR: 8274074: SIGFPE with C2 compiled code with -XX:+StressGCM [v2] In-Reply-To: References: Message-ID: > In the testcase, the divisor input node of a `DivI` node is sunk out of a loop to a div by zero UCT and is pinned with a `CastII` node to ensure it's not floating back into the loop. The divisor is optimized by taking into account that it's only executed on the uncommon path. The `CastII`, however, is removed later and the division floats back into the loop which results in a SIGFPE crash. > > The relevent lines in the testcase are the following two divisions: > > static int iFld = 1; > static int q = 0; > ... > y = iFld - q; // divisor > y = (iArrFld[2] / y); // division 1 > y = (5 / iFld); // division 2 > > After sinking the `divisor` of `division 1` in the testcase to the div by zero UCT of `division 2`, the graph looks like this: > > ![Screenshot from 2021-09-21 14-40-37](https://user-images.githubusercontent.com/17833009/134506632-9904da7b-8210-4301-85dc-04441324fe55.png) > > - `201 If` is the zero check of `division 2` (will always succeed because `iFld = 1`, i.e. UCT is never taken). > - `193 DivI` (`division 1`) is not sunk because its `get_ctrl()` is `203 IfFalse` (outside the loop already because there is no use inside the loop since the local `y` is directly overwritten again). > - `275 SubI` (`divisor`) was sunk out of the loop and is pinned by `276 CastII` (unconditional dependency). > > In IGVN, `CastII::Value()` is called for `276 CastII`. It sees an `If/Cmp` (zero check of `division 2`) with the same `137 LoadI` input as for the `276 CastII`. Therefore, we set its type to [0,0] here: > https://github.com/openjdk/jdk/blob/d0987513665def1b6b2981ab5932b6f1b8b310d8/src/hotspot/share/opto/castnode.cpp#L248-L252 > > As a result, we replace `276 CastII` with a constant zero in IGVN. But now we lost the pin to the uncommon path of the zero check of `division 2` for `275 SubI` and `193 DivI`. `193 DivI` is only used on the uncommon path but can now float around again, also inside the loop itself, which happens in the testcase. Inside the loop, we execute the division with the now optimized divisor `0 - q = 0` which is a division by zero and we crash. > > In summary, it's not a problem that a `Div` node floats above its zero check here but rather that we optimize an input node used as divisor by assuming that we only execute the division on the uncommon path when the zero check of `division 2` failed (which never happens). This divisor optimization would be wrong when the division is executed inside the loop. But due to losing the pin, we end up doing exactly that which results in a SIGFPE crash. > > The suggested fix is to extend the sinking algorithm to rewire data nodes with a control input inside a loop whose `get_ctrl()` is actually completely outside loops on uncommon paths. The control input is set to `get_ctrl()` to force the nodes out of loops. In the example above, the control input of `193 DivI` is set to `203 IfFalse`, ensuring that it is still pinned to the uncommon path after `276 CastII` is removed. This fix is also beneficial if we do not sink any nodes at all later. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Extend fix to any loops ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5651/files - new: https://git.openjdk.java.net/jdk/pull/5651/files/e9599d90..941420c1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5651&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5651&range=00-01 Stats: 15 lines in 1 file changed: 10 ins; 5 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5651.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5651/head:pull/5651 PR: https://git.openjdk.java.net/jdk/pull/5651 From chagedorn at openjdk.java.net Fri Sep 24 09:46:24 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 24 Sep 2021 09:46:24 GMT Subject: RFR: 8274074: SIGFPE with C2 compiled code with -XX:+StressGCM [v2] In-Reply-To: References: Message-ID: <_b31QYaKtp3pZnIEIqmjNvtBHX7WBXZa4JTqvNwH3ik=.c20e8832-59f2-4d94-b3d7-c4dea96e4cd3@github.com> On Thu, 23 Sep 2021 15:55:02 GMT, Roland Westrelin wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Extend fix to any loops > > src/hotspot/share/opto/loopopts.cpp line 1561: > >> 1559: } >> 1560: _dom_lca_tags_round = 0; >> 1561: } else if (n_loop == _ltree_root && n->in(0) != NULL && get_loop(n->in(0)) != _ltree_root) { > > Couldn't the node be out of this loop but not necessarily out of all loops? You're right, that's an unnecessary limitation. I've tried to find an example where this happens but could not come up with one. But I'm sure that situation will occur at some point. I pushed an update. ------------- PR: https://git.openjdk.java.net/jdk/pull/5651 From chagedorn at openjdk.java.net Fri Sep 24 10:46:18 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 24 Sep 2021 10:46:18 GMT Subject: RFR: 8273410: IR verification framework fails with "Should find method name in validIrRulesMap" Message-ID: The IR framework treated a `@Check` method as `@Test` method instead of the `@Test` method itself at IR matching time resulting in an internal framework exception. While writing some tests for checked test I've noticed that a missing `@Arguments` annotation is not reported as `TestFormatException` but with a `RuntimeException` later when invoking the method in question. I also added the missing check for it. Thanks, Christian ------------- Commit messages: - 8273410: IR verification framework fails with "Should find method name in validIrRulesMap" Changes: https://git.openjdk.java.net/jdk/pull/5678/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5678&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273410 Stats: 233 lines in 3 files changed: 232 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5678.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5678/head:pull/5678 PR: https://git.openjdk.java.net/jdk/pull/5678 From github.com+39403138+fg1417 at openjdk.java.net Fri Sep 24 11:44:03 2021 From: github.com+39403138+fg1417 at openjdk.java.net (Fei Gao) Date: Fri, 24 Sep 2021 11:44:03 GMT Subject: RFR: 8272968: AArch64: Remove redundant matching rules for commutative ops Message-ID: Match rules for commutative operations mnegI/mnegL/smnegL might become redundant after function matchrule_clone_and_swap(), and hence can be reduced. In adlc part, while parsing the contents of an instruction definition, function instr_parse always do the check for commutative operations with subtree operands, create clones and swap operands by function matchrule_clone_and_swap. It means that another operand-swapped and partially symmetrical match rule should be generated automatically for these commutative operations. The pattern to construct mnegI, mnegL or smnegL consists of a subtraction with zero and then a multiplication. In function count_commutative_op, both mulI and mulL are recognized as commutative opcodes. Therefore, we need only one match rule to specify that a multipilication consists of a number and a subtraction with zero for these three instructions and the extra one can be deleted. Take mnegL as an example. Without my patch, four match rules will be created finally for instruction selection. Two of them are created by ad files: Match Rule 1: dst = MulL (SubL zero src1) src2 ===> dst = mnegl src1 src2 Match Rule 2: dst = MulL src1 (SubL zero src2) ===> dst = mnegl src1 src2 The other two are automatically generated by function matchrule_clone_and_swap based on the two rules above: Match Rule 3 (generated by match rule 1): dst = MulL src2 (SubL zero src1) ===> dst = mnegl src1 src2 Match Rule 4 (generated by match rule 2): dst = MulL (SubL zero src2) src1 ===> dst = mnegl src1 src2 As mnegl is commutative, Rule 3 is equivalent to Rule 2, and Rule 1 is equivalent to Rule 4. Also, if we only keep the original Match Rule 1, as showed above, Rule 3 will be generated automatically later. In this way, Rule 2 and Rule 4 are redundant and hence Rule 2 can be eliminated. With my patch, Rule 2 is removed and Rule 4 won't be generated as well. Only Rule 1 and 3 are kept in the final rule chain. In my local release build, as redundant code got removed, the size of libjvm.so decreased from 23.30 MB to 23.27 MB, with a reduction of 33.11 KB(around 0.14%). ------------- Commit messages: - 8272968: AArch64: Remove redundant matching rules for Changes: https://git.openjdk.java.net/jdk/pull/5646/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5646&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272968 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5646.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5646/head:pull/5646 PR: https://git.openjdk.java.net/jdk/pull/5646 From github.com+39403138+fg1417 at openjdk.java.net Fri Sep 24 11:44:03 2021 From: github.com+39403138+fg1417 at openjdk.java.net (Fei Gao) Date: Fri, 24 Sep 2021 11:44:03 GMT Subject: RFR: 8272968: AArch64: Remove redundant matching rules for commutative ops In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 03:29:44 GMT, Fei Gao wrote: > Match rules for commutative operations mnegI/mnegL/smnegL might > become redundant after function matchrule_clone_and_swap(), > and hence can be reduced. > > In adlc part, while parsing the contents of an instruction > definition, function instr_parse always do the check > for commutative operations with subtree operands, create > clones and swap operands by function matchrule_clone_and_swap. > It means that another operand-swapped and partially > symmetrical match rule should be generated automatically for > these commutative operations. > > The pattern to construct mnegI, mnegL or smnegL consists of > a subtraction with zero and then a multiplication. In function > count_commutative_op, both mulI and mulL are recognized as > commutative opcodes. Therefore, we need only one match rule > to specify that a multipilication consists of a number and > a subtraction with zero for these three instructions and the > extra one can be deleted. > > Take mnegL as an example. > > Without my patch, four match rules will be created finally for > instruction selection. > > Two of them are created by ad files: > > Match Rule 1: > dst = MulL (SubL zero src1) src2 > ===> > dst = mnegl src1 src2 > > Match Rule 2: > dst = MulL src1 (SubL zero src2) > ===> > dst = mnegl src1 src2 > > The other two are automatically generated by function > matchrule_clone_and_swap based on the two rules above: > > Match Rule 3 (generated by match rule 1): > dst = MulL src2 (SubL zero src1) > ===> > dst = mnegl src1 src2 > > Match Rule 4 (generated by match rule 2): > dst = MulL (SubL zero src2) src1 > ===> > dst = mnegl src1 src2 > > As mnegl is commutative, Rule 3 is equivalent to > Rule 2, and Rule 1 is equivalent to Rule 4. Also, if we only > keep the original Match Rule 1, as showed above, Rule 3 will > be generated automatically later. In this way, Rule 2 and Rule 4 > are redundant and hence Rule 2 can be eliminated. > > With my patch, Rule 2 is removed and Rule 4 won't be generated as well. > Only Rule 1 and 3 are kept in the final rule chain. In my local release > build, as redundant code got removed, the size of libjvm.so decreased > from 23.30 MB to 23.27 MB, with a reduction of 33.11 KB(around 0.14%). The patch is the same as https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2021-September/049967.html. Because unclear reason, my old account was suspended and I had to create a new one and submit the patch again. ------------- PR: https://git.openjdk.java.net/jdk/pull/5646 From rrich at openjdk.java.net Fri Sep 24 13:33:55 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Fri, 24 Sep 2021 13:33:55 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 18:09:48 GMT, Martin Doerr wrote: > Would be interesting to know what else benefits from it. Maybe startup performance (class loading may use many Exceptions). That's true. Many exceptions are thrown in classloading but these are not builtin exceptions and therefore not affected. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From dsamersoff at openjdk.java.net Fri Sep 24 14:27:04 2021 From: dsamersoff at openjdk.java.net (Dmitry Samersoff) Date: Fri, 24 Sep 2021 14:27:04 GMT Subject: RFR: 8273380: ARM32: Default to {ldrexd, strexd} in StubRoutines::atomic_{load|store}_long [v2] In-Reply-To: <4HloD4y6X-zkMlSoxw7h7KG2_cwaYM13c44_X1n69rA=.20782f5e-1826-40cc-a8fa-9bfaac389e89@github.com> References: <4HloD4y6X-zkMlSoxw7h7KG2_cwaYM13c44_X1n69rA=.20782f5e-1826-40cc-a8fa-9bfaac389e89@github.com> Message-ID: On Thu, 23 Sep 2021 11:17:36 GMT, Aleksey Shipilev wrote: >> Current ARM32 is one of few remaining uses of `os::is_MP`, the rest is removed by JDK-8188764. There are some interesting bugs in OS/libc that might give incorrect `os::is_MP` sometimes, e.g. in containers. Instead of risking it, we can default to {ldrexd,strexd} for ARMv7 (which always have them), and leave the `os::is_MP` path for ARMv6 (for which this is the only remaining way to load the 64-bit long). >> >> @mychris, you might want to take a look and do light performance testing for it? >> >> Additional testing: >> - [x] Linux ARM32 cross-compiled build completes > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Move out bx > - Merge branch 'master' into JDK-8273380-arm32-default-to-ldrex > - First attempt Marked as reviewed by dsamersoff (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5379 From dsamersoff at openjdk.java.net Fri Sep 24 14:27:05 2021 From: dsamersoff at openjdk.java.net (Dmitry Samersoff) Date: Fri, 24 Sep 2021 14:27:05 GMT Subject: RFR: 8273380: ARM32: Default to {ldrexd, strexd} in StubRoutines::atomic_{load|store}_long [v2] In-Reply-To: <7wnio9k4gJNLVLl7JFji9TamqAxDR_t0kpmMYSrRNwc=.f54e1a82-151d-4a89-a3d9-7529130d865c@github.com> References: <7wnio9k4gJNLVLl7JFji9TamqAxDR_t0kpmMYSrRNwc=.f54e1a82-151d-4a89-a3d9-7529130d865c@github.com> Message-ID: On Thu, 23 Sep 2021 15:54:33 GMT, Aleksey Shipilev wrote: >> src/hotspot/cpu/arm/stubGenerator_arm.cpp line 641: >> >>> 639: __ ldrexd(result_lo, Address(src)); >>> 640: __ clrex(); // FIXME: safe to remove? >>> 641: __ bx(LR); >> >> bx(LR) is common for all 3 branches, so it might be better to move it out > > @dsamersoff, are you happy with new version? Yes. Thank you! ------------- PR: https://git.openjdk.java.net/jdk/pull/5379 From shade at openjdk.java.net Fri Sep 24 15:25:07 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 24 Sep 2021 15:25:07 GMT Subject: RFR: 8273380: ARM32: Default to {ldrexd, strexd} in StubRoutines::atomic_{load|store}_long [v2] In-Reply-To: <4HloD4y6X-zkMlSoxw7h7KG2_cwaYM13c44_X1n69rA=.20782f5e-1826-40cc-a8fa-9bfaac389e89@github.com> References: <4HloD4y6X-zkMlSoxw7h7KG2_cwaYM13c44_X1n69rA=.20782f5e-1826-40cc-a8fa-9bfaac389e89@github.com> Message-ID: On Thu, 23 Sep 2021 11:17:36 GMT, Aleksey Shipilev wrote: >> Current ARM32 is one of few remaining uses of `os::is_MP`, the rest is removed by JDK-8188764. There are some interesting bugs in OS/libc that might give incorrect `os::is_MP` sometimes, e.g. in containers. Instead of risking it, we can default to {ldrexd,strexd} for ARMv7 (which always have them), and leave the `os::is_MP` path for ARMv6 (for which this is the only remaining way to load the 64-bit long). >> >> @mychris, you might want to take a look and do light performance testing for it? >> >> Additional testing: >> - [x] Linux ARM32 cross-compiled build completes > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Move out bx > - Merge branch 'master' into JDK-8273380-arm32-default-to-ldrex > - First attempt Thank you! ------------- PR: https://git.openjdk.java.net/jdk/pull/5379 From shade at openjdk.java.net Fri Sep 24 15:35:05 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 24 Sep 2021 15:35:05 GMT Subject: Integrated: 8273380: ARM32: Default to {ldrexd, strexd} in StubRoutines::atomic_{load|store}_long In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 11:28:58 GMT, Aleksey Shipilev wrote: > Current ARM32 is one of few remaining uses of `os::is_MP`, the rest is removed by JDK-8188764. There are some interesting bugs in OS/libc that might give incorrect `os::is_MP` sometimes, e.g. in containers. Instead of risking it, we can default to {ldrexd,strexd} for ARMv7 (which always have them), and leave the `os::is_MP` path for ARMv6 (for which this is the only remaining way to load the 64-bit long). > > @mychris, you might want to take a look and do light performance testing for it? > > Additional testing: > - [x] Linux ARM32 cross-compiled build completes This pull request has now been integrated. Changeset: 718eff2b Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/718eff2bb6e938440df9f7b982ef6d2f4060a759 Stats: 20 lines in 1 file changed: 8 ins; 8 del; 4 mod 8273380: ARM32: Default to {ldrexd,strexd} in StubRoutines::atomic_{load|store}_long Reviewed-by: dlong, dsamersoff ------------- PR: https://git.openjdk.java.net/jdk/pull/5379 From redestad at openjdk.java.net Fri Sep 24 16:04:08 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 24 Sep 2021 16:04:08 GMT Subject: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 21:58:48 GMT, Claes Redestad wrote: > This patch extends the `ISO_8859_1.implEncodeISOArray` intrinsic on x86 to work also for ASCII encoding, which makes for example the `UTF_8$Encoder` perform on par with (or outperform) similarly getting charset encoded bytes from a String. The former took a small performance hit in JDK 9, and the latter improved greatly in the same release. > > Extending the `EncodeIsoArray` intrinsics on other platforms should be possible, but I'm unfamiliar with the macro assembler in general and unlike the x86 intrinsic they don't use a simple vectorized mask to implement the latin-1 check. For example aarch64 seem to filter out the low bytes and then check if there's any bits set in the high bytes. Clever, but very different to the 0xFF80 2-byte mask that an ASCII test wants. The current version (cef05f4) copies the ISO_8859_1.implEncodeISOArray intrinsic and adapts it to work on ASCII encoding, which makes the UTF_8$Encoder perform on par with (or outperform) encoding from a String. Using microbenchmarks provided by @carterkozak here: https://github.com/carterkozak/stringbuilder-encoding-performance Baseline: Benchmark (charsetName) (message) (timesToAppend) Mode Cnt Score Error Units EncoderBenchmarks.charsetEncoder UTF-8 This is a simple ASCII message 3 avgt 8 270.237 ? 10.504 ns/op EncoderBenchmarks.charsetEncoder UTF-8 This is a message with unicode ?? 3 avgt 8 568.353 ? 2.331 ns/op EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a simple ASCII message 3 avgt 8 324.889 ? 17.466 ns/op EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a message with unicode ?? 3 avgt 8 633.720 ? 22.703 ns/op EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a simple ASCII message 3 avgt 8 1132.436 ? 30.661 ns/op EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a message with unicode ?? 3 avgt 8 1379.207 ? 66.982 ns/op EncoderBenchmarks.toStringGetBytes UTF-8 This is a simple ASCII message 3 avgt 8 91.253 ? 3.848 ns/op EncoderBenchmarks.toStringGetBytes UTF-8 This is a message with unicode ?? 3 avgt 8 519.489 ? 12.516 ns/op Patch: Benchmark (charsetName) (message) (timesToAppend) Mode Cnt Score Error Units EncoderBenchmarks.charsetEncoder UTF-8 This is a simple ASCII message 3 avgt 4 82.535 ? 20.310 ns/op EncoderBenchmarks.charsetEncoder UTF-8 This is a message with unicode ?? 3 avgt 4 522.679 ? 13.456 ns/op EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a simple ASCII message 3 avgt 4 127.831 ? 32.612 ns/op EncoderBenchmarks.charsetEncoderWithAllocation UTF-8 This is a message with unicode ?? 3 avgt 4 549.343 ? 59.899 ns/op EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a simple ASCII message 3 avgt 4 1182.835 ? 153.735 ns/op EncoderBenchmarks.charsetEncoderWithAllocationWrappingBuilder UTF-8 This is a message with unicode ?? 3 avgt 4 1416.407 ? 130.551 ns/op EncoderBenchmarks.toStringGetBytes UTF-8 This is a simple ASCII message 3 avgt 4 97.770 ? 15.742 ns/op EncoderBenchmarks.toStringGetBytes UTF-8 This is a message with unicode ?? 3 avgt 4 516.351 ? 58.580 ns/op This can probably be simplified further, say by adding a flag to the intrinsic of whether we're encoding ASCII only or ISO-8859-1. It also needs to be implemented and tested on all architectures. (edit: accidentally edit rather than quote-reply, restored original comment) On the JDK-included `CharsetEncodeDecode.encode` microbenchmark, I get these numbers in the baseline (18-b09): Benchmark (size) (type) Mode Cnt Score Error Units CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 39.962 ? 1.703 us/op CharsetEncodeDecode.encode 16384 BIG5 avgt 30 153.282 ? 4.521 us/op CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 192.040 ? 4.543 us/op CharsetEncodeDecode.encode 16384 ASCII avgt 30 40.051 ? 1.210 us/op CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 302.815 ? 9.490 us/op With the proposed patch: Benchmark (size) (type) Mode Cnt Score Error Units CharsetEncodeDecode.encode 16384 UTF-8 avgt 30 4.081 ? 0.182 us/op CharsetEncodeDecode.encode 16384 BIG5 avgt 30 150.374 ? 3.579 us/op CharsetEncodeDecode.encode 16384 ISO-8859-15 avgt 30 4.010 ? 0.179 us/op CharsetEncodeDecode.encode 16384 ASCII avgt 30 3.961 ? 0.176 us/op CharsetEncodeDecode.encode 16384 UTF-16 avgt 30 302.235 ? 11.395 us/op That is: on my system encoding 16K char ASCII data is 10x faster for UTF-8 and ASCII, and roughly 48x faster for ASCII-compatible charsets like ISO-8859-15. On 3rd party microbenchmarks we can assert that performance for non-ASCII input either doesn't change, or improves when messages have an ASCII prefix. ------------- PR: https://git.openjdk.java.net/jdk/pull/5621 From redestad at openjdk.java.net Fri Sep 24 16:04:08 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 24 Sep 2021 16:04:08 GMT Subject: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 Message-ID: This patch extends the `ISO_8859_1.implEncodeISOArray` intrinsic on x86 to work also for ASCII encoding, which makes for example the `UTF_8$Encoder` perform on par with (or outperform) similarly getting charset encoded bytes from a String. The former took a small performance hit in JDK 9, and the latter improved greatly in the same release. Extending the `EncodeIsoArray` intrinsics on other platforms should be possible, but I'm unfamiliar with the macro assembler in general and unlike the x86 intrinsic they don't use a simple vectorized mask to implement the latin-1 check. For example aarch64 seem to filter out the low bytes and then check if there's any bits set in the high bytes. Clever, but very different to the 0xFF80 2-byte mask that an ASCII test wants. ------------- Commit messages: - Freshen up CharsetEncodeDecode micro (still only tests ASCII), optimize ASCII-compatible SingleByte (e.g. ISO-8859-15) - Add @bug id to test - Move and adapt defunct Test6896617 test to test more Charsets without using internal APIs; remove adhoc performance tests - Exclude encodeAsciiArray intrinsic on all non-X86 platforms - Fix indentation - Remove EncodeAsciiArray node and instead extend EncodeISOArray with an is_ascii predicate - Merge MacroAssembler methods - Implement intrinsified ASCII fast-path by copying and adapting encodeISOArray intrinsic (currently x86 only) - Enhance UTF_8.Encoder by using StringUTF16.compress for ASCII Changes: https://git.openjdk.java.net/jdk/pull/5621/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5621&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274242 Stats: 763 lines in 20 files changed: 375 ins; 355 del; 33 mod Patch: https://git.openjdk.java.net/jdk/pull/5621.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5621/head:pull/5621 PR: https://git.openjdk.java.net/jdk/pull/5621 From redestad at openjdk.java.net Fri Sep 24 16:04:08 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Fri, 24 Sep 2021 16:04:08 GMT Subject: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 00:03:55 GMT, Claes Redestad wrote: > This can probably be simplified further, say by adding a flag to the intrinsic of whether we're encoding ASCII only or ISO-8859-1. Done: Removed the addition of a new C2 Node, merged the macro assembler encode_iso_array and encode_ascii_array and added a predicate to select the behavior. > It also needs to be implemented and tested on all architectures. Implementing this on other hardware is Future Work. The non-x86 intrinsics for implEncodeISOArray all seem to use clever tricks rather than a simple mask that can be easily switched from detecting non-latin-1(0xFF00) to detecting ASCII (0xFF80). Clever tricks make it rather challenging to extend this like I could easily do in the x86 code (most all assembler is foreign to me) ------------- PR: https://git.openjdk.java.net/jdk/pull/5621 From neliasso at openjdk.java.net Fri Sep 24 17:45:53 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 24 Sep 2021 17:45:53 GMT Subject: RFR: 8271459: C2: Missing NegativeArraySizeException when creating StringBuilder with negative capacity In-Reply-To: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> References: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> Message-ID: <9RWR3YTXg2Nrek_NoYW5BIW927f8anF8a-FZU8BIBCk=.a61dd2d8-d7cf-4696-96e1-2236d61dc494@github.com> On Thu, 23 Sep 2021 13:25:51 GMT, Christian Hagedorn wrote: > Stringopts does not take into account that a negative `int` argument for `StringBuilder(int)` results in a `NegativeArraySizeException` when optimizing away `StringBuilder` usages into single strings. > > The suggested fix does the following: > - Bailout of Stringopts if C2 knows that an `int` argument is always negative. > - Apply stringopts but insert an additional runtime check with an UCT if C2 cannot tell if an `int` argument is positive or negative. > > I added some IR tests to verify the fix and also ran some standard benchmarks. > > I also updated `TestIRMatching` to test the new and updated default regexes. > > Thanks, > Christian If we hit the uncommontrap - should we recompile without applying strconcat? Or is this code so uncommon that we don't want to add complexity for that case? ------------- PR: https://git.openjdk.java.net/jdk/pull/5652 From naoto at openjdk.java.net Fri Sep 24 20:42:55 2021 From: naoto at openjdk.java.net (Naoto Sato) Date: Fri, 24 Sep 2021 20:42:55 GMT Subject: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 21:58:48 GMT, Claes Redestad wrote: > This patch extends the `ISO_8859_1.implEncodeISOArray` intrinsic on x86 to work also for ASCII encoding, which makes for example the `UTF_8$Encoder` perform on par with (or outperform) similarly getting charset encoded bytes from a String. The former took a small performance hit in JDK 9, and the latter improved greatly in the same release. > > Extending the `EncodeIsoArray` intrinsics on other platforms should be possible, but I'm unfamiliar with the macro assembler in general and unlike the x86 intrinsic they don't use a simple vectorized mask to implement the latin-1 check. For example aarch64 seem to filter out the low bytes and then check if there's any bits set in the high bytes. Clever, but very different to the 0xFF80 2-byte mask that an ASCII test wants. core library part of the changes looks good to me. ------------- Marked as reviewed by naoto (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5621 From sviswanathan at openjdk.java.net Fri Sep 24 23:19:17 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 24 Sep 2021 23:19:17 GMT Subject: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 21:58:48 GMT, Claes Redestad wrote: > This patch extends the `ISO_8859_1.implEncodeISOArray` intrinsic on x86 to work also for ASCII encoding, which makes for example the `UTF_8$Encoder` perform on par with (or outperform) similarly getting charset encoded bytes from a String. The former took a small performance hit in JDK 9, and the latter improved greatly in the same release. > > Extending the `EncodeIsoArray` intrinsics on other platforms should be possible, but I'm unfamiliar with the macro assembler in general and unlike the x86 intrinsic they don't use a simple vectorized mask to implement the latin-1 check. For example aarch64 seem to filter out the low bytes and then check if there's any bits set in the high bytes. Clever, but very different to the 0xFF80 2-byte mask that an ASCII test wants. x86 part of changes look good. ------------- PR: https://git.openjdk.java.net/jdk/pull/5621 From dlong at openjdk.java.net Sat Sep 25 00:08:53 2021 From: dlong at openjdk.java.net (Dean Long) Date: Sat, 25 Sep 2021 00:08:53 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. If this was the interpreter, it seems like these exception objects could go into the class constant pool as condy objects. But because the ideal backtrace includes inlining information, it is a use case for something I'll call nmethod constant pools. GC would probably need to scan it as strong roots (I'm not a GC expert), but it would get rid of the need for the JNI global refs. And the exception objects could be allocated on a slow path when first needed rather than eagerly at compile time. I have other uses in mind for nmethod constant pools, so I'm showing my bias here, but I also think a condy approach extends the system in a more elegant way. ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From wuyan at openjdk.java.net Mon Sep 27 01:34:04 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Mon, 27 Sep 2021 01:34:04 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v5] In-Reply-To: References: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> Message-ID: On Sat, 18 Sep 2021 03:47:39 GMT, Eric Liu wrote: >> src/hotspot/cpu/aarch64/aarch64.ad line 2445: >> >>> 2443: case Op_VectorCastB2X: >>> 2444: case Op_VectorCastS2X: >>> 2445: if (vlen < 4) { >> >> The vector_size_supported() check should already cover this and no need to check it here? > > I suggested to test this PR with the latest code. As some vector sizes which should not be supported for `VectorReinterpret` and `VectorCast*2X` have been fixed after https://github.com/openjdk/jdk/pull/5160. E.g. 128short => 64int. > > For the case above, as the species are different, it > 1. reinterprets 8 short to 2 short > 2. casts 2 short to 2 int > > Since we don't support short type with element size less than 4, this situation should be detected as unsupported when trying to generate `VectorReinterpret` node with 2 short, which in current branch is mistaken for 4 short. > > I think those workaround code for jdk17(https://github.com/openjdk/jdk17/commit/52788702c11683fbf0e79dca07e75cf1fd8fc334) could be removed entirely in this work. You are right, I'll fix it. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From wuyan at openjdk.java.net Mon Sep 27 01:34:05 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Mon, 27 Sep 2021 01:34:05 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v5] In-Reply-To: References: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> Message-ID: On Fri, 17 Sep 2021 09:47:52 GMT, Ningsheng Jian wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix bugs > > src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 280: > >> 278: match(Set dst (VectorCast$2`'2X src)); >> 279: format %{ "fcvtzs $dst, T$6, $src\n\t" >> 280: "xtn $dst, T$7, $dst, T$6\n\t# convert $1$2 to $1$3 vector" > > "\n\t" --> "\t" at the last line of the block. OK, fix it. > src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 298: > >> 296: format %{ "fcvtzs $dst, T4S, $src\n\t" >> 297: "xtn $dst, T4H, $dst, T4S\n\t" >> 298: "xtn $dst, T8B, $dst, T8H\n\t# convert 4F to 4B vector" > > xtn $dst, T8B, $dst, T8H\n\t# convert 4F to 4B vector" > > => xtn $dst, T8B, $dst, T8H\t# convert 4F to 4B vector" OK. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From wuyan at openjdk.java.net Mon Sep 27 01:39:16 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Mon, 27 Sep 2021 01:39:16 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v5] In-Reply-To: References: <6SkOgskSfXuMp1XarC2BO9zBUw_Zj1pcUMKNHffiCQs=.66c156b1-ef01-48a4-8f28-8351089a5646@github.com> Message-ID: <0FhFvcvl_7J-fyhE7Vo8NrfcvTA1IDMjTyysvjbn2e4=.ca86124d-b05b-4ddc-9f16-2d8b8c918151@github.com> On Wed, 22 Sep 2021 08:02:34 GMT, Andrew Haley wrote: >> Wang Huang has updated the pull request incrementally with one additional commit since the last revision: >> >> fix bugs > > src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 233: > >> 231: match(Set dst (VectorCastB2X src)); >> 232: format %{ "sxtl $dst, T8H, $src, T8B\n\t" >> 233: "sxtl $dst, T4S, $dst, T4H\n\t" > > This whitespace change looks odd. This whitespace is to align with the next instruction. ------------- PR: https://git.openjdk.java.net/jdk/pull/4839 From thartmann at openjdk.java.net Mon Sep 27 06:15:58 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 27 Sep 2021 06:15:58 GMT Subject: RFR: 8273410: IR verification framework fails with "Should find method name in validIrRulesMap" In-Reply-To: References: Message-ID: On Fri, 24 Sep 2021 10:36:34 GMT, Christian Hagedorn wrote: > The IR framework treated a `@Check` method as `@Test` method instead of the `@Test` method itself at IR matching time resulting in an internal framework exception. > > While writing some tests for checked test I've noticed that a missing `@Arguments` annotation is not reported as `TestFormatException` but with a `RuntimeException` later when invoking the method in question. I also added the missing check for it. > > Thanks, > Christian Looks good to me. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestCheckedTests.java line 33: > 31: /* > 32: * @test > 33: * @requires vm.debug == true & vm.compMode != "Xint" & vm.compiler2.enabled & vm.flagless You might want to add an `@bug` tag. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestCheckedTests.java line 201: > 199: } > 200: > 201: Double newline. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5678 From thartmann at openjdk.java.net Mon Sep 27 06:22:00 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 27 Sep 2021 06:22:00 GMT Subject: RFR: 8274130: C2: MulNode::Ideal chained transformations may act on wrong nodes In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 10:12:16 GMT, Aleksey Shipilev wrote: > I was puzzled by it when fixing JDK-8274060. It looks that new optimizations added by [JDK-8273454](https://bugs.openjdk.java.net/browse/JDK-8273454) and [JDK-8263006](https://bugs.openjdk.java.net/browse/JDK-8263006) rewire `in(1)` and `in(2)` in `MulNode::Ideal`, which means the chained transformations should see them? Yet, both inputs and their `Type`-s are cached locally and not refreshed. I have not seen failures due to this yet, but it looks that the current code is subtly incorrect because of this. > > I thought about doing `return this` instead of `progress = true`, so that we leave `MulNode::Ideal` once we hit any transform and hope to return back, but I wondered if that would expose us to different graph shapes in-between successive `MulNode::Ideal` calls, which might have other unintended consequences. Therefore, I opted to a more conservative patch. > > Additional testing: > - [x] `compiler/` tests > - [x] `tier1` tests > - [x] 100K Fuzzer tests (one unrelated failure) That looks good to me (I would prefer the pointer asterisk next to the type though). ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5631 From whuang at openjdk.java.net Mon Sep 27 06:25:00 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Mon, 27 Sep 2021 06:25:00 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v6] In-Reply-To: References: Message-ID: > * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. > * It may be a solver of JDK-8269866, or part of it. Wang Huang has updated the pull request incrementally with one additional commit since the last revision: code refine ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4839/files - new: https://git.openjdk.java.net/jdk/pull/4839/files/4f0613cb..2fca8cd3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4839&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4839&range=04-05 Stats: 23 lines in 6 files changed: 1 ins; 4 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/4839.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4839/head:pull/4839 PR: https://git.openjdk.java.net/jdk/pull/4839 From thartmann at openjdk.java.net Mon Sep 27 06:40:06 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 27 Sep 2021 06:40:06 GMT Subject: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 In-Reply-To: References: Message-ID: <5lVTU7su0xmkfNlex-Pv_yoeuJpnoCyalI42oLSjGzg=.54d143bc-ede4-4ec8-a257-1538f43ea632@github.com> On Tue, 21 Sep 2021 21:58:48 GMT, Claes Redestad wrote: > This patch extends the `ISO_8859_1.implEncodeISOArray` intrinsic on x86 to work also for ASCII encoding, which makes for example the `UTF_8$Encoder` perform on par with (or outperform) similarly getting charset encoded bytes from a String. The former took a small performance hit in JDK 9, and the latter improved greatly in the same release. > > Extending the `EncodeIsoArray` intrinsics on other platforms should be possible, but I'm unfamiliar with the macro assembler in general and unlike the x86 intrinsic they don't use a simple vectorized mask to implement the latin-1 check. For example aarch64 seem to filter out the low bytes and then check if there's any bits set in the high bytes. Clever, but very different to the 0xFF80 2-byte mask that an ASCII test wants. Very nice. The changes look good to me, just added some minor comments. Should we remove the "iso" part from the method/class names? src/hotspot/cpu/x86/x86_32.ad line 12218: > 12216: instruct encode_ascii_array(eSIRegP src, eDIRegP dst, eDXRegI len, > 12217: regD tmp1, regD tmp2, regD tmp3, regD tmp4, > 12218: eCXRegI tmp5, eAXRegI result, eFlagsReg cr) %{ Indentation is wrong. src/hotspot/cpu/x86/x86_32.ad line 12223: > 12221: effect(TEMP tmp1, TEMP tmp2, TEMP tmp3, TEMP tmp4, USE_KILL src, USE_KILL dst, USE_KILL len, KILL tmp5, KILL cr); > 12222: > 12223: format %{ "Encode array $src,$dst,$len -> $result // KILL ECX, EDX, $tmp1, $tmp2, $tmp3, $tmp4, ESI, EDI " %} You might want to change the opto assembly comment to "Encode ascii array" (and to "Encode iso array" above). Same on 64-bit. src/hotspot/share/opto/intrinsicnode.hpp line 171: > 169: > 170: //------------------------------EncodeISOArray-------------------------------- > 171: // encode char[] to byte[] in ISO_8859_1 Comment should be adjusted to `... in ISO_8859_1 or ASCII`. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5621 From thartmann at openjdk.java.net Mon Sep 27 07:21:00 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 27 Sep 2021 07:21:00 GMT Subject: RFR: 8274074: SIGFPE with C2 compiled code with -XX:+StressGCM [v2] In-Reply-To: References: Message-ID: On Fri, 24 Sep 2021 09:46:21 GMT, Christian Hagedorn wrote: >> In the testcase, the divisor input node of a `DivI` node is sunk out of a loop to a div by zero UCT and is pinned with a `CastII` node to ensure it's not floating back into the loop. The divisor is optimized by taking into account that it's only executed on the uncommon path. The `CastII`, however, is removed later and the division floats back into the loop which results in a SIGFPE crash. >> >> The relevent lines in the testcase are the following two divisions: >> >> static int iFld = 1; >> static int q = 0; >> ... >> y = iFld - q; // divisor >> y = (iArrFld[2] / y); // division 1 >> y = (5 / iFld); // division 2 >> >> After sinking the `divisor` of `division 1` in the testcase to the div by zero UCT of `division 2`, the graph looks like this: >> >> ![Screenshot from 2021-09-21 14-40-37](https://user-images.githubusercontent.com/17833009/134506632-9904da7b-8210-4301-85dc-04441324fe55.png) >> >> - `201 If` is the zero check of `division 2` (will always succeed because `iFld = 1`, i.e. UCT is never taken). >> - `193 DivI` (`division 1`) is not sunk because its `get_ctrl()` is `203 IfFalse` (outside the loop already because there is no use inside the loop since the local `y` is directly overwritten again). >> - `275 SubI` (`divisor`) was sunk out of the loop and is pinned by `276 CastII` (unconditional dependency). >> >> In IGVN, `CastII::Value()` is called for `276 CastII`. It sees an `If/Cmp` (zero check of `division 2`) with the same `137 LoadI` input as for the `276 CastII`. Therefore, we set its type to [0,0] here: >> https://github.com/openjdk/jdk/blob/d0987513665def1b6b2981ab5932b6f1b8b310d8/src/hotspot/share/opto/castnode.cpp#L248-L252 >> >> As a result, we replace `276 CastII` with a constant zero in IGVN. But now we lost the pin to the uncommon path of the zero check of `division 2` for `275 SubI` and `193 DivI`. `193 DivI` is only used on the uncommon path but can now float around again, also inside the loop itself, which happens in the testcase. Inside the loop, we execute the division with the now optimized divisor `0 - q = 0` which is a division by zero and we crash. >> >> In summary, it's not a problem that a `Div` node floats above its zero check here but rather that we optimize an input node used as divisor by assuming that we only execute the division on the uncommon path when the zero check of `division 2` failed (which never happens). This divisor optimization would be wrong when the division is executed inside the loop. But due to losing the pin, we end up doing exactly that which results in a SIGFPE crash. >> >> The suggested fix is to extend the sinking algorithm to rewire data nodes with a control input inside a loop whose `get_ctrl()` is actually completely outside loops on uncommon paths. The control input is set to `get_ctrl()` to force the nodes out of loops. In the example above, the control input of `193 DivI` is set to `203 IfFalse`, ensuring that it is still pinned to the uncommon path after `276 CastII` is removed. This fix is also beneficial if we do not sink any nodes at all later. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Extend fix to any loops Looks good to me. test/hotspot/jtreg/compiler/loopopts/TestSinkingDivisorLostPin.java line 67: > 65: // DivI node D is only used on IfFalse path of zero check Z2 into UCT (on IfTrue path, the result is not used anywhere > 66: // because we directly overwrite it again with "y = (5 / iFld)). The IfFalse path of the zero check, however, is never > 67: // taken because iFld = 1. But before applying the sinking algorithm, the DivI node D could be be executed during the "could be be" -> "could be" test/hotspot/jtreg/compiler/loopopts/TestSinkingDivisorLostPin.java line 72: > 70: // propagated into the CastII node whose type is improved to [0,0] and the node is replaced by constant zero), the > 71: // DivI node must NOT be executed inside the loop anymore. But the DivI node is executed in the loop because of losing > 72: // the CastII pin. The fix is to updated the control input of the DivI node to the get_ctrl() input outside the loop "to updated" -> "to update" ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5651 From chagedorn at openjdk.java.net Mon Sep 27 07:37:34 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 27 Sep 2021 07:37:34 GMT Subject: RFR: 8271459: C2: Missing NegativeArraySizeException when creating StringBuilder with negative capacity [v2] In-Reply-To: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> References: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> Message-ID: > Stringopts does not take into account that a negative `int` argument for `StringBuilder(int)` results in a `NegativeArraySizeException` when optimizing away `StringBuilder` usages into single strings. > > The suggested fix does the following: > - Bailout of Stringopts if C2 knows that an `int` argument is always negative. > - Apply stringopts but insert an additional runtime check with an UCT if C2 cannot tell if an `int` argument is positive or negative. > > I added some IR tests to verify the fix and also ran some standard benchmarks. > > I also updated `TestIRMatching` to test the new and updated default regexes. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Update UCT to Action_maybe_recompile ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5652/files - new: https://git.openjdk.java.net/jdk/pull/5652/files/ab8b6f8a..4e4f5222 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5652&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5652&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5652.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5652/head:pull/5652 PR: https://git.openjdk.java.net/jdk/pull/5652 From chagedorn at openjdk.java.net Mon Sep 27 07:37:36 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 27 Sep 2021 07:37:36 GMT Subject: RFR: 8271459: C2: Missing NegativeArraySizeException when creating StringBuilder with negative capacity In-Reply-To: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> References: <-_kFk0kfF5npDXL-qyMSIFfglZwDHlV-jyMgBc7GmXI=.178d9895-ce14-414a-b07e-c2060f8ab9b2@github.com> Message-ID: On Thu, 23 Sep 2021 13:25:51 GMT, Christian Hagedorn wrote: > Stringopts does not take into account that a negative `int` argument for `StringBuilder(int)` results in a `NegativeArraySizeException` when optimizing away `StringBuilder` usages into single strings. > > The suggested fix does the following: > - Bailout of Stringopts if C2 knows that an `int` argument is always negative. > - Apply stringopts but insert an additional runtime check with an UCT if C2 cannot tell if an `int` argument is positive or negative. > > I added some IR tests to verify the fix and also ran some standard benchmarks. > > I also updated `TestIRMatching` to test the new and updated default regexes. > > Thanks, > Christian I guess we could do that by changing the UCT from `Action_none` to `Action_maybe_recompile`. I pushed an update. ------------- PR: https://git.openjdk.java.net/jdk/pull/5652 From chagedorn at openjdk.java.net Mon Sep 27 07:41:59 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 27 Sep 2021 07:41:59 GMT Subject: RFR: 8273410: IR verification framework fails with "Should find method name in validIrRulesMap" [v2] In-Reply-To: References: Message-ID: > The IR framework treated a `@Check` method as `@Test` method instead of the `@Test` method itself at IR matching time resulting in an internal framework exception. > > While writing some tests for checked test I've noticed that a missing `@Arguments` annotation is not reported as `TestFormatException` but with a `RuntimeException` later when invoking the method in question. I also added the missing check for it. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Add @bug ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5678/files - new: https://git.openjdk.java.net/jdk/pull/5678/files/e2a5d83e..ae88dcfa Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5678&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5678&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5678.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5678/head:pull/5678 PR: https://git.openjdk.java.net/jdk/pull/5678 From chagedorn at openjdk.java.net Mon Sep 27 07:42:00 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 27 Sep 2021 07:42:00 GMT Subject: RFR: 8273410: IR verification framework fails with "Should find method name in validIrRulesMap" In-Reply-To: References: Message-ID: On Fri, 24 Sep 2021 10:36:34 GMT, Christian Hagedorn wrote: > The IR framework treated a `@Check` method as `@Test` method instead of the `@Test` method itself at IR matching time resulting in an internal framework exception. > > While writing some tests for checked test I've noticed that a missing `@Arguments` annotation is not reported as `TestFormatException` but with a `RuntimeException` later when invoking the method in question. I also added the missing check for it. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5678 From chagedorn at openjdk.java.net Mon Sep 27 07:44:13 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 27 Sep 2021 07:44:13 GMT Subject: RFR: 8274074: SIGFPE with C2 compiled code with -XX:+StressGCM [v3] In-Reply-To: References: Message-ID: > In the testcase, the divisor input node of a `DivI` node is sunk out of a loop to a div by zero UCT and is pinned with a `CastII` node to ensure it's not floating back into the loop. The divisor is optimized by taking into account that it's only executed on the uncommon path. The `CastII`, however, is removed later and the division floats back into the loop which results in a SIGFPE crash. > > The relevent lines in the testcase are the following two divisions: > > static int iFld = 1; > static int q = 0; > ... > y = iFld - q; // divisor > y = (iArrFld[2] / y); // division 1 > y = (5 / iFld); // division 2 > > After sinking the `divisor` of `division 1` in the testcase to the div by zero UCT of `division 2`, the graph looks like this: > > ![Screenshot from 2021-09-21 14-40-37](https://user-images.githubusercontent.com/17833009/134506632-9904da7b-8210-4301-85dc-04441324fe55.png) > > - `201 If` is the zero check of `division 2` (will always succeed because `iFld = 1`, i.e. UCT is never taken). > - `193 DivI` (`division 1`) is not sunk because its `get_ctrl()` is `203 IfFalse` (outside the loop already because there is no use inside the loop since the local `y` is directly overwritten again). > - `275 SubI` (`divisor`) was sunk out of the loop and is pinned by `276 CastII` (unconditional dependency). > > In IGVN, `CastII::Value()` is called for `276 CastII`. It sees an `If/Cmp` (zero check of `division 2`) with the same `137 LoadI` input as for the `276 CastII`. Therefore, we set its type to [0,0] here: > https://github.com/openjdk/jdk/blob/d0987513665def1b6b2981ab5932b6f1b8b310d8/src/hotspot/share/opto/castnode.cpp#L248-L252 > > As a result, we replace `276 CastII` with a constant zero in IGVN. But now we lost the pin to the uncommon path of the zero check of `division 2` for `275 SubI` and `193 DivI`. `193 DivI` is only used on the uncommon path but can now float around again, also inside the loop itself, which happens in the testcase. Inside the loop, we execute the division with the now optimized divisor `0 - q = 0` which is a division by zero and we crash. > > In summary, it's not a problem that a `Div` node floats above its zero check here but rather that we optimize an input node used as divisor by assuming that we only execute the division on the uncommon path when the zero check of `division 2` failed (which never happens). This divisor optimization would be wrong when the division is executed inside the loop. But due to losing the pin, we end up doing exactly that which results in a SIGFPE crash. > > The suggested fix is to extend the sinking algorithm to rewire data nodes with a control input inside a loop whose `get_ctrl()` is actually completely outside loops on uncommon paths. The control input is set to `get_ctrl()` to force the nodes out of loops. In the example above, the control input of `193 DivI` is set to `203 IfFalse`, ensuring that it is still pinned to the uncommon path after `276 CastII` is removed. This fix is also beneficial if we do not sink any nodes at all later. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Fix typo ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5651/files - new: https://git.openjdk.java.net/jdk/pull/5651/files/941420c1..0238b730 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5651&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5651&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5651.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5651/head:pull/5651 PR: https://git.openjdk.java.net/jdk/pull/5651 From chagedorn at openjdk.java.net Mon Sep 27 07:44:14 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 27 Sep 2021 07:44:14 GMT Subject: RFR: 8274074: SIGFPE with C2 compiled code with -XX:+StressGCM [v2] In-Reply-To: References: Message-ID: <2qe9P78QZUPCbWxZVIz6-by63GcA77vxdYRPJy1eKJc=.dd63accd-02bf-405e-9211-e287575b1b74@github.com> On Fri, 24 Sep 2021 09:46:21 GMT, Christian Hagedorn wrote: >> In the testcase, the divisor input node of a `DivI` node is sunk out of a loop to a div by zero UCT and is pinned with a `CastII` node to ensure it's not floating back into the loop. The divisor is optimized by taking into account that it's only executed on the uncommon path. The `CastII`, however, is removed later and the division floats back into the loop which results in a SIGFPE crash. >> >> The relevent lines in the testcase are the following two divisions: >> >> static int iFld = 1; >> static int q = 0; >> ... >> y = iFld - q; // divisor >> y = (iArrFld[2] / y); // division 1 >> y = (5 / iFld); // division 2 >> >> After sinking the `divisor` of `division 1` in the testcase to the div by zero UCT of `division 2`, the graph looks like this: >> >> ![Screenshot from 2021-09-21 14-40-37](https://user-images.githubusercontent.com/17833009/134506632-9904da7b-8210-4301-85dc-04441324fe55.png) >> >> - `201 If` is the zero check of `division 2` (will always succeed because `iFld = 1`, i.e. UCT is never taken). >> - `193 DivI` (`division 1`) is not sunk because its `get_ctrl()` is `203 IfFalse` (outside the loop already because there is no use inside the loop since the local `y` is directly overwritten again). >> - `275 SubI` (`divisor`) was sunk out of the loop and is pinned by `276 CastII` (unconditional dependency). >> >> In IGVN, `CastII::Value()` is called for `276 CastII`. It sees an `If/Cmp` (zero check of `division 2`) with the same `137 LoadI` input as for the `276 CastII`. Therefore, we set its type to [0,0] here: >> https://github.com/openjdk/jdk/blob/d0987513665def1b6b2981ab5932b6f1b8b310d8/src/hotspot/share/opto/castnode.cpp#L248-L252 >> >> As a result, we replace `276 CastII` with a constant zero in IGVN. But now we lost the pin to the uncommon path of the zero check of `division 2` for `275 SubI` and `193 DivI`. `193 DivI` is only used on the uncommon path but can now float around again, also inside the loop itself, which happens in the testcase. Inside the loop, we execute the division with the now optimized divisor `0 - q = 0` which is a division by zero and we crash. >> >> In summary, it's not a problem that a `Div` node floats above its zero check here but rather that we optimize an input node used as divisor by assuming that we only execute the division on the uncommon path when the zero check of `division 2` failed (which never happens). This divisor optimization would be wrong when the division is executed inside the loop. But due to losing the pin, we end up doing exactly that which results in a SIGFPE crash. >> >> The suggested fix is to extend the sinking algorithm to rewire data nodes with a control input inside a loop whose `get_ctrl()` is actually completely outside loops on uncommon paths. The control input is set to `get_ctrl()` to force the nodes out of loops. In the example above, the control input of `193 DivI` is set to `203 IfFalse`, ensuring that it is still pinned to the uncommon path after `276 CastII` is removed. This fix is also beneficial if we do not sink any nodes at all later. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Extend fix to any loops Thanks Tobias for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5651 From roland at openjdk.java.net Mon Sep 27 08:40:57 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 27 Sep 2021 08:40:57 GMT Subject: RFR: 8274074: SIGFPE with C2 compiled code with -XX:+StressGCM [v3] In-Reply-To: References: Message-ID: On Mon, 27 Sep 2021 07:44:13 GMT, Christian Hagedorn wrote: >> In the testcase, the divisor input node of a `DivI` node is sunk out of a loop to a div by zero UCT and is pinned with a `CastII` node to ensure it's not floating back into the loop. The divisor is optimized by taking into account that it's only executed on the uncommon path. The `CastII`, however, is removed later and the division floats back into the loop which results in a SIGFPE crash. >> >> The relevent lines in the testcase are the following two divisions: >> >> static int iFld = 1; >> static int q = 0; >> ... >> y = iFld - q; // divisor >> y = (iArrFld[2] / y); // division 1 >> y = (5 / iFld); // division 2 >> >> After sinking the `divisor` of `division 1` in the testcase to the div by zero UCT of `division 2`, the graph looks like this: >> >> ![Screenshot from 2021-09-21 14-40-37](https://user-images.githubusercontent.com/17833009/134506632-9904da7b-8210-4301-85dc-04441324fe55.png) >> >> - `201 If` is the zero check of `division 2` (will always succeed because `iFld = 1`, i.e. UCT is never taken). >> - `193 DivI` (`division 1`) is not sunk because its `get_ctrl()` is `203 IfFalse` (outside the loop already because there is no use inside the loop since the local `y` is directly overwritten again). >> - `275 SubI` (`divisor`) was sunk out of the loop and is pinned by `276 CastII` (unconditional dependency). >> >> In IGVN, `CastII::Value()` is called for `276 CastII`. It sees an `If/Cmp` (zero check of `division 2`) with the same `137 LoadI` input as for the `276 CastII`. Therefore, we set its type to [0,0] here: >> https://github.com/openjdk/jdk/blob/d0987513665def1b6b2981ab5932b6f1b8b310d8/src/hotspot/share/opto/castnode.cpp#L248-L252 >> >> As a result, we replace `276 CastII` with a constant zero in IGVN. But now we lost the pin to the uncommon path of the zero check of `division 2` for `275 SubI` and `193 DivI`. `193 DivI` is only used on the uncommon path but can now float around again, also inside the loop itself, which happens in the testcase. Inside the loop, we execute the division with the now optimized divisor `0 - q = 0` which is a division by zero and we crash. >> >> In summary, it's not a problem that a `Div` node floats above its zero check here but rather that we optimize an input node used as divisor by assuming that we only execute the division on the uncommon path when the zero check of `division 2` failed (which never happens). This divisor optimization would be wrong when the division is executed inside the loop. But due to losing the pin, we end up doing exactly that which results in a SIGFPE crash. >> >> The suggested fix is to extend the sinking algorithm to rewire data nodes with a control input inside a loop whose `get_ctrl()` is actually completely outside loops on uncommon paths. The control input is set to `get_ctrl()` to force the nodes out of loops. In the example above, the control input of `193 DivI` is set to `203 IfFalse`, ensuring that it is still pinned to the uncommon path after `276 CastII` is removed. This fix is also beneficial if we do not sink any nodes at all later. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo Looks good. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5651 From shade at openjdk.java.net Mon Sep 27 08:58:26 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 27 Sep 2021 08:58:26 GMT Subject: RFR: 8274130: C2: MulNode::Ideal chained transformations may act on wrong nodes [v2] In-Reply-To: References: Message-ID: > I was puzzled by it when fixing JDK-8274060. It looks that new optimizations added by [JDK-8273454](https://bugs.openjdk.java.net/browse/JDK-8273454) and [JDK-8263006](https://bugs.openjdk.java.net/browse/JDK-8263006) rewire `in(1)` and `in(2)` in `MulNode::Ideal`, which means the chained transformations should see them? Yet, both inputs and their `Type`-s are cached locally and not refreshed. I have not seen failures due to this yet, but it looks that the current code is subtly incorrect because of this. > > I thought about doing `return this` instead of `progress = true`, so that we leave `MulNode::Ideal` once we hit any transform and hope to return back, but I wondered if that would expose us to different graph shapes in-between successive `MulNode::Ideal` calls, which might have other unintended consequences. Therefore, I opted to a more conservative patch. > > Additional testing: > - [x] `compiler/` tests > - [x] `tier1` tests > - [x] 100K Fuzzer tests (one unrelated failure) Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Align the stars manually ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5631/files - new: https://git.openjdk.java.net/jdk/pull/5631/files/932f039a..c8d3cb12 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5631&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5631&range=00-01 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/5631.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5631/head:pull/5631 PR: https://git.openjdk.java.net/jdk/pull/5631 From shade at openjdk.java.net Mon Sep 27 08:58:28 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 27 Sep 2021 08:58:28 GMT Subject: RFR: 8274130: C2: MulNode::Ideal chained transformations may act on wrong nodes [v2] In-Reply-To: References: Message-ID: On Mon, 27 Sep 2021 06:18:55 GMT, Tobias Hartmann wrote: > That looks good to me (I would prefer the pointer asterisk next to the type though). Aligned! See new commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/5631 From thartmann at openjdk.java.net Mon Sep 27 09:09:03 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 27 Sep 2021 09:09:03 GMT Subject: RFR: 8274130: C2: MulNode::Ideal chained transformations may act on wrong nodes [v2] In-Reply-To: References: Message-ID: On Mon, 27 Sep 2021 08:58:26 GMT, Aleksey Shipilev wrote: >> I was puzzled by it when fixing JDK-8274060. It looks that new optimizations added by [JDK-8273454](https://bugs.openjdk.java.net/browse/JDK-8273454) and [JDK-8263006](https://bugs.openjdk.java.net/browse/JDK-8263006) rewire `in(1)` and `in(2)` in `MulNode::Ideal`, which means the chained transformations should see them? Yet, both inputs and their `Type`-s are cached locally and not refreshed. I have not seen failures due to this yet, but it looks that the current code is subtly incorrect because of this. >> >> I thought about doing `return this` instead of `progress = true`, so that we leave `MulNode::Ideal` once we hit any transform and hope to return back, but I wondered if that would expose us to different graph shapes in-between successive `MulNode::Ideal` calls, which might have other unintended consequences. Therefore, I opted to a more conservative patch. >> >> Additional testing: >> - [x] `compiler/` tests >> - [x] `tier1` tests >> - [x] 100K Fuzzer tests (one unrelated failure) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Align the stars manually Looks good, thanks for changing. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5631 From roland at openjdk.java.net Mon Sep 27 09:27:29 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 27 Sep 2021 09:27:29 GMT Subject: RFR: 8274145: C2: Incorrect computation after JDK-8269752 Message-ID: <5B9h2DzETeX6X6cg-xH0jNgWxbFrzw4Xlfxv01pTzkA=.b19b9e3e-ef57-44f1-976a-42f0e0929b73@github.com> The bug happens because an If node that follows a CountedLoop is replaced by the CountedLoopEnd node of the main loop. Further unrolling happens after the If is replaced which causes the condition of the CountedLoopEnd node to change. This is made possible by JDK-8269752. The fix I propose is to detect that corner case and prevent the If to be replaced in that case. ------------- Commit messages: - fix & test Changes: https://git.openjdk.java.net/jdk/pull/5712/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5712&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274145 Stats: 149 lines in 3 files changed: 148 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5712.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5712/head:pull/5712 PR: https://git.openjdk.java.net/jdk/pull/5712 From whuang at openjdk.java.net Mon Sep 27 09:37:20 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Mon, 27 Sep 2021 09:37:20 GMT Subject: RFR: 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend [v7] In-Reply-To: References: Message-ID: > * In this issue, we plan to complete all missing implementation for aarch64 neon backend. For example, cast from Byte to Long, cast from Long to Byte, and so on. > * It may be a solver of JDK-8269866, or part of it. Wang Huang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - merge master & fix conflict - code refine - fix bugs - fix codes - fix codes - fix code style - fix bugs - 8259948: Aarch64: Add cast nodes for Aarch64 Neon backend ------------- Changes: https://git.openjdk.java.net/jdk/pull/4839/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4839&range=06 Stats: 521 lines in 6 files changed: 273 ins; 54 del; 194 mod Patch: https://git.openjdk.java.net/jdk/pull/4839.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4839/head:pull/4839 PR: https://git.openjdk.java.net/jdk/pull/4839 From simonis at openjdk.java.net Mon Sep 27 09:38:01 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 27 Sep 2021 09:38:01 GMT Subject: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 In-Reply-To: References: Message-ID: <_Dzhdy1vRupNXlRezru6UpkvEJV3Vlr149pbkryW2I4=.4c6e5e5b-7026-4410-9848-2a8d12aa71f7@github.com> On Tue, 21 Sep 2021 21:58:48 GMT, Claes Redestad wrote: > This patch extends the `ISO_8859_1.implEncodeISOArray` intrinsic on x86 to work also for ASCII encoding, which makes for example the `UTF_8$Encoder` perform on par with (or outperform) similarly getting charset encoded bytes from a String. The former took a small performance hit in JDK 9, and the latter improved greatly in the same release. > > Extending the `EncodeIsoArray` intrinsics on other platforms should be possible, but I'm unfamiliar with the macro assembler in general and unlike the x86 intrinsic they don't use a simple vectorized mask to implement the latin-1 check. For example aarch64 seem to filter out the low bytes and then check if there's any bits set in the high bytes. Clever, but very different to the 0xFF80 2-byte mask that an ASCII test wants. src/hotspot/share/opto/c2compiler.cpp line 222: > 220: #if !defined(X86) > 221: return false; // not yet implemented > 222: #endif It might be a little more work, but I think it's cleaner to move the decision whether the intrinisc is supported into the Matcher like for most other intrinsics and keep this code here platform independent. Otherwise we will get an increasing cascade of ifdefs as people start implementing this for other platforms. ------------- PR: https://git.openjdk.java.net/jdk/pull/5621 From jiefu at openjdk.java.net Mon Sep 27 09:48:48 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 27 Sep 2021 09:48:48 GMT Subject: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern Message-ID: Hi all, I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). However, I failed with C4474 and C4778 warnings as below: Compiling 100 properties into resource bundles for java.desktop Compiling 3038 files for java.base e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error C2220: the following warning is treated as an error e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4778: 'sscanf' : unterminated format string '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4474: 'sscanf' : too many arguments passed for format string e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: placeholders and their parameters expect 1 variadic arguments, but 3 were provided e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4778: 'sscanf' : unterminated format string '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning C4474: 'sscanf' : too many arguments passed for format string e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: placeholders and their parameters expect 0 variadic arguments, but 2 were provided The failure is caused by non-ASCII chars in the format string of sscanf [1][2], which is non-portable on our Windows platform. In fact, these non-ASCII coding also triggers C4819 warning, which had been disabled in JDK-8216154 [3]. And I also found an article showing that sscanf may fail with non-ASCII in the format string [4]. So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). And I think it's safe to do so. This is because: 1) There are actually no non-ASCII chars for package/class/method/signature names. 2) I don't think there is a use case, in which people will input non-ASCII for `CompileCommand`. You may argue that the non-ASCII may be used by the parser itself. But I didn't find that usage at all. (Please let me know if I miss something.) So I suggest to remove these non-ASCII code to make HotSpot to be more portable. And if we do so, we can also remove the only one `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. Testing: - Build tests on Windows - tier1~3 on Linux/x64 Thanks. Best regards, Jie [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 [3] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ [5] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 ------------- Commit messages: - 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern Changes: https://git.openjdk.java.net/jdk/pull/5704/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5704&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274329 Stats: 13 lines in 1 file changed: 0 ins; 12 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5704.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5704/head:pull/5704 PR: https://git.openjdk.java.net/jdk/pull/5704 From yyang at openjdk.java.net Mon Sep 27 09:49:37 2021 From: yyang at openjdk.java.net (Yi Yang) Date: Mon, 27 Sep 2021 09:49:37 GMT Subject: RFR: 8274328: C2: Redundant CFG edges fixup in block ordering Message-ID: I think Trace::fixup_blocks is redundant because PhaseCFG::fixup_flow will nevertheless fix up the CFG flow(i.e. flip successor blocks of IfNode) right after PhaseBlockLayout pass, we can remove this step when doing PhaseBlockLayout pass.(Testing: jtreg/compiler/c2, presubmit test) ------------- Commit messages: - 8274328: C2: Redundant CFG edges fixup in block ordering Changes: https://git.openjdk.java.net/jdk/pull/5705/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5705&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274328 Stats: 53 lines in 2 files changed: 6 ins; 43 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/5705.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5705/head:pull/5705 PR: https://git.openjdk.java.net/jdk/pull/5705 From redestad at openjdk.java.net Mon Sep 27 11:41:34 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 27 Sep 2021 11:41:34 GMT Subject: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 [v2] In-Reply-To: References: Message-ID: > This patch extends the `ISO_8859_1.implEncodeISOArray` intrinsic on x86 to work also for ASCII encoding, which makes for example the `UTF_8$Encoder` perform on par with (or outperform) similarly getting charset encoded bytes from a String. The former took a small performance hit in JDK 9, and the latter improved greatly in the same release. > > Extending the `EncodeIsoArray` intrinsics on other platforms should be possible, but I'm unfamiliar with the macro assembler in general and unlike the x86 intrinsic they don't use a simple vectorized mask to implement the latin-1 check. For example aarch64 seem to filter out the low bytes and then check if there's any bits set in the high bytes. Clever, but very different to the 0xFF80 2-byte mask that an ASCII test wants. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Tobias review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5621/files - new: https://git.openjdk.java.net/jdk/pull/5621/files/8edc228f..12ab6ff5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5621&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5621&range=00-01 Stats: 10 lines in 3 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/5621.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5621/head:pull/5621 PR: https://git.openjdk.java.net/jdk/pull/5621 From redestad at openjdk.java.net Mon Sep 27 12:03:08 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 27 Sep 2021 12:03:08 GMT Subject: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 [v2] In-Reply-To: <5lVTU7su0xmkfNlex-Pv_yoeuJpnoCyalI42oLSjGzg=.54d143bc-ede4-4ec8-a257-1538f43ea632@github.com> References: <5lVTU7su0xmkfNlex-Pv_yoeuJpnoCyalI42oLSjGzg=.54d143bc-ede4-4ec8-a257-1538f43ea632@github.com> Message-ID: On Mon, 27 Sep 2021 06:36:50 GMT, Tobias Hartmann wrote: > Should we remove the "iso" part from the method/class names? I'm open to suggestions, but I've not been able to think of anything better. `encodeISOOrASCII` doesn't seem helpful and since ASCII is a subset of the ISO-8859-1 encoding referred to by the "iso" moniker then the ASCII-only variant is technically encoding chars to valid ISO-8859-1. ------------- PR: https://git.openjdk.java.net/jdk/pull/5621 From redestad at openjdk.java.net Mon Sep 27 12:12:05 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 27 Sep 2021 12:12:05 GMT Subject: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 [v3] In-Reply-To: References: Message-ID: > This patch extends the `ISO_8859_1.implEncodeISOArray` intrinsic on x86 to work also for ASCII encoding, which makes for example the `UTF_8$Encoder` perform on par with (or outperform) similarly getting charset encoded bytes from a String. The former took a small performance hit in JDK 9, and the latter improved greatly in the same release. > > Extending the `EncodeIsoArray` intrinsics on other platforms should be possible, but I'm unfamiliar with the macro assembler in general and unlike the x86 intrinsic they don't use a simple vectorized mask to implement the latin-1 check. For example aarch64 seem to filter out the low bytes and then check if there's any bits set in the high bytes. Clever, but very different to the 0xFF80 2-byte mask that an ASCII test wants. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Add Matcher predicate to avoid changing shared code as non-x86 platforms implements support for the _encodeAsciiArray intrinsic ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5621/files - new: https://git.openjdk.java.net/jdk/pull/5621/files/12ab6ff5..9800a99a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5621&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5621&range=01-02 Stats: 17 lines in 6 files changed: 14 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5621.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5621/head:pull/5621 PR: https://git.openjdk.java.net/jdk/pull/5621 From redestad at openjdk.java.net Mon Sep 27 12:14:05 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Mon, 27 Sep 2021 12:14:05 GMT Subject: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 [v3] In-Reply-To: <_Dzhdy1vRupNXlRezru6UpkvEJV3Vlr149pbkryW2I4=.4c6e5e5b-7026-4410-9848-2a8d12aa71f7@github.com> References: <_Dzhdy1vRupNXlRezru6UpkvEJV3Vlr149pbkryW2I4=.4c6e5e5b-7026-4410-9848-2a8d12aa71f7@github.com> Message-ID: On Mon, 27 Sep 2021 09:34:21 GMT, Volker Simonis wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Add Matcher predicate to avoid changing shared code as non-x86 platforms implements support for the _encodeAsciiArray intrinsic > > src/hotspot/share/opto/c2compiler.cpp line 222: > >> 220: #if !defined(X86) >> 221: return false; // not yet implemented >> 222: #endif > > It might be a little more work, but I think it's cleaner to move the decision whether the intrinisc is supported into the Matcher like for most other intrinsics and keep this code here platform independent. Otherwise we will get an increasing cascade of ifdefs as people start implementing this for other platforms. Not too much work. I recently introduced platform-specific `matcher_*.hpp` files, so since then adding a boolean constant is easy (no need to muck with the .ad files). ------------- PR: https://git.openjdk.java.net/jdk/pull/5621 From roland at openjdk.java.net Mon Sep 27 12:18:03 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 27 Sep 2021 12:18:03 GMT Subject: RFR: 8272562: C2: assert(false) failed: Bad graph detected in build_loop_late Message-ID: A counted loop has an array access: (LoadI (AddP (AddP (ConvI2L (CastII (AddI (Phi ...))))))) that is array[iv - 1] The Phi is the iv Phi. The ConvI2L/CastII are from a range check and capture type: 0..maxint-1 The loop is unrolled once, the LoadI is cloned. array[iv - 1] array[iv] The first LoadI is only used out of loop and is sunk with 2 clones. One of the clones is on the IfFalse branch of a test for iv != 0. (LoadI (AddP (AddP (ConvI2L (CastII (AddI (CastII ...))))))) The second CastII pins the nodes out of the loop. The ConvI2L and CastII are pushed thru the AddI (for -1). As a result the ConvI2L has type: 1..maxint-1 The CastII, because it has the same input as the input for iv != 0, becomes 0 which is not part of 1..maxint-1. The ConvI2L and all its uses including the LoadI become top. The use of the LoadI is a Phi that is transformed into its remaining input and the graph is broken. The root cause is that the loop body initially contains: if (iv - 1 >=u array.length) { // range check trap(); } if (iv == 0) { // path where nodes are sunk later on } And obviously if iv - 1 >= 0 then iv == 0 is always false but c2 fails to prove it. I tried to implement a simple fix for this issue but while it fixes this bug, I couldn't convince myself that it was robust enough. So instead I propose following the suggestion Christian and Vladimir I. made in: https://github.com/openjdk/jdk/pull/5199 that is to more generally exclude cast nodes from sinking as a workaround for now. I've been looking for a more general solution to this problem and I have a prototype that fixes this failure but is a lot more complicated. I'll revisit this workaround when it's ready. ------------- Commit messages: - fix & test Changes: https://git.openjdk.java.net/jdk/pull/5716/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5716&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272562 Stats: 55 lines in 2 files changed: 54 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5716.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5716/head:pull/5716 PR: https://git.openjdk.java.net/jdk/pull/5716 From thartmann at openjdk.java.net Mon Sep 27 12:30:52 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 27 Sep 2021 12:30:52 GMT Subject: RFR: 8274242: Implement fast-path for ASCII-compatible CharsetEncoders on x86 [v3] In-Reply-To: References: <5lVTU7su0xmkfNlex-Pv_yoeuJpnoCyalI42oLSjGzg=.54d143bc-ede4-4ec8-a257-1538f43ea632@github.com> Message-ID: On Mon, 27 Sep 2021 11:59:54 GMT, Claes Redestad wrote: > > Should we remove the "iso" part from the method/class names? > > I'm open to suggestions, but I've not been able to think of anything better. `encodeISOOrASCII` doesn't seem helpful and since ASCII is a subset of the ISO-8859-1 encoding referred to by the "iso" moniker then the ASCII-only variant is technically encoding chars to valid ISO-8859-1. Okay, that's fine with me. ------------- PR: https://git.openjdk.java.net/jdk/pull/5621 From chagedorn at openjdk.java.net Mon Sep 27 13:10:06 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 27 Sep 2021 13:10:06 GMT Subject: RFR: 8272562: C2: assert(false) failed: Bad graph detected in build_loop_late In-Reply-To: References: Message-ID: On Mon, 27 Sep 2021 12:05:45 GMT, Roland Westrelin wrote: > A counted loop has an array access: > > (LoadI (AddP (AddP (ConvI2L (CastII (AddI (Phi ...))))))) > > that is > > array[iv - 1] > > The Phi is the iv Phi. The ConvI2L/CastII are from a range check > and capture type: 0..maxint-1 The loop is unrolled once, the > LoadI is cloned. > > array[iv - 1] > array[iv] > > The first LoadI is only used out of loop and is sunk with 2 > clones. One of the clones is on the IfFalse branch of a test > for iv != 0. > > (LoadI (AddP (AddP (ConvI2L (CastII (AddI (CastII ...))))))) > > The second CastII pins the nodes out of the loop. The ConvI2L and > CastII are pushed thru the AddI (for -1). As a result the ConvI2L > has type: > 1..maxint-1 > > The CastII, because it has the same input as the input for iv != 0, > becomes 0 which is not part of 1..maxint-1. The ConvI2L and all its > uses including the LoadI become top. The use of the LoadI is a Phi > that is transformed into its remaining input and the graph is broken. > > The root cause is that the loop body initially contains: > > if (iv - 1 >=u array.length) { // range check > trap(); > } > > if (iv == 0) { > // path where nodes are sunk later on > } > > And obviously if iv - 1 >= 0 then iv == 0 is always false but c2 fails > to prove it. I tried to implement a simple fix for this issue but > while it fixes this bug, I couldn't convince myself that it was robust > enough. > > So instead I propose following the suggestion Christian and Vladimir I. > made in: > > https://github.com/openjdk/jdk/pull/5199 > > that is to more generally exclude cast nodes from sinking as a > workaround for now. > > I've been looking for a more general solution to this problem and I > have a prototype that fixes this failure but is a lot more > complicated. I'll revisit this workaround when it's ready. I agree with your suggestion to follow up with a more general solution later and go with this easier fix for now. src/hotspot/share/opto/loopopts.cpp line 1449: > 1447: !is_raw_to_oop_cast && // don't extend live ranges of raw oops > 1448: n->Opcode() != Op_Opaque4 && > 1449: (n->Opcode() == Op_CastII && ((CastIINode*)n)->has_range_check())) { Shouldn't the condition be inverted? This would only allow to sink range check `CastII` nodes. And couldn't you use `!(n->is_CastII() && n->as_CastII()->has_range_check())` instead of the explicit cast? ------------- Changes requested by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5716 From roland at openjdk.java.net Mon Sep 27 13:57:56 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 27 Sep 2021 13:57:56 GMT Subject: RFR: 8272562: C2: assert(false) failed: Bad graph detected in build_loop_late [v2] In-Reply-To: References: Message-ID: > A counted loop has an array access: > > (LoadI (AddP (AddP (ConvI2L (CastII (AddI (Phi ...))))))) > > that is > > array[iv - 1] > > The Phi is the iv Phi. The ConvI2L/CastII are from a range check > and capture type: 0..maxint-1 The loop is unrolled once, the > LoadI is cloned. > > array[iv - 1] > array[iv] > > The first LoadI is only used out of loop and is sunk with 2 > clones. One of the clones is on the IfFalse branch of a test > for iv != 0. > > (LoadI (AddP (AddP (ConvI2L (CastII (AddI (CastII ...))))))) > > The second CastII pins the nodes out of the loop. The ConvI2L and > CastII are pushed thru the AddI (for -1). As a result the ConvI2L > has type: > 1..maxint-1 > > The CastII, because it has the same input as the input for iv != 0, > becomes 0 which is not part of 1..maxint-1. The ConvI2L and all its > uses including the LoadI become top. The use of the LoadI is a Phi > that is transformed into its remaining input and the graph is broken. > > The root cause is that the loop body initially contains: > > if (iv - 1 >=u array.length) { // range check > trap(); > } > > if (iv == 0) { > // path where nodes are sunk later on > } > > And obviously if iv - 1 >= 0 then iv == 0 is always false but c2 fails > to prove it. I tried to implement a simple fix for this issue but > while it fixes this bug, I couldn't convince myself that it was robust > enough. > > So instead I propose following the suggestion Christian and Vladimir I. > made in: > > https://github.com/openjdk/jdk/pull/5199 > > that is to more generally exclude cast nodes from sinking as a > workaround for now. > > I've been looking for a more general solution to this problem and I > have a prototype that fixes this failure but is a lot more > complicated. I'll revisit this workaround when it's ready. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: conservative fix ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5716/files - new: https://git.openjdk.java.net/jdk/pull/5716/files/76f60e86..7324e96a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5716&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5716&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5716.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5716/head:pull/5716 PR: https://git.openjdk.java.net/jdk/pull/5716 From roland at openjdk.java.net Mon Sep 27 13:57:57 2021 From: roland at openjdk.java.net (Roland Westrelin) Date: Mon, 27 Sep 2021 13:57:57 GMT Subject: RFR: 8272562: C2: assert(false) failed: Bad graph detected in build_loop_late [v2] In-Reply-To: References: Message-ID: On Mon, 27 Sep 2021 12:52:50 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> conservative fix > > src/hotspot/share/opto/loopopts.cpp line 1449: > >> 1447: !is_raw_to_oop_cast && // don't extend live ranges of raw oops >> 1448: n->Opcode() != Op_Opaque4 && >> 1449: (n->Opcode() == Op_CastII && ((CastIINode*)n)->has_range_check())) { > > Shouldn't the condition be inverted? This would only allow to sink range check `CastII` nodes. > And couldn't you use `!(n->is_CastII() && n->as_CastII()->has_range_check())` instead of the explicit cast? Right! Thanks for catching this. Actually after thinking more about this, I changed the fix to prevent all nodes that can capture a type from being sunk to be on the safe side, for now. ------------- PR: https://git.openjdk.java.net/jdk/pull/5716 From chagedorn at openjdk.java.net Mon Sep 27 14:06:55 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 27 Sep 2021 14:06:55 GMT Subject: RFR: 8274074: SIGFPE with C2 compiled code with -XX:+StressGCM [v3] In-Reply-To: References: Message-ID: On Mon, 27 Sep 2021 07:44:13 GMT, Christian Hagedorn wrote: >> In the testcase, the divisor input node of a `DivI` node is sunk out of a loop to a div by zero UCT and is pinned with a `CastII` node to ensure it's not floating back into the loop. The divisor is optimized by taking into account that it's only executed on the uncommon path. The `CastII`, however, is removed later and the division floats back into the loop which results in a SIGFPE crash. >> >> The relevent lines in the testcase are the following two divisions: >> >> static int iFld = 1; >> static int q = 0; >> ... >> y = iFld - q; // divisor >> y = (iArrFld[2] / y); // division 1 >> y = (5 / iFld); // division 2 >> >> After sinking the `divisor` of `division 1` in the testcase to the div by zero UCT of `division 2`, the graph looks like this: >> >> ![Screenshot from 2021-09-21 14-40-37](https://user-images.githubusercontent.com/17833009/134506632-9904da7b-8210-4301-85dc-04441324fe55.png) >> >> - `201 If` is the zero check of `division 2` (will always succeed because `iFld = 1`, i.e. UCT is never taken). >> - `193 DivI` (`division 1`) is not sunk because its `get_ctrl()` is `203 IfFalse` (outside the loop already because there is no use inside the loop since the local `y` is directly overwritten again). >> - `275 SubI` (`divisor`) was sunk out of the loop and is pinned by `276 CastII` (unconditional dependency). >> >> In IGVN, `CastII::Value()` is called for `276 CastII`. It sees an `If/Cmp` (zero check of `division 2`) with the same `137 LoadI` input as for the `276 CastII`. Therefore, we set its type to [0,0] here: >> https://github.com/openjdk/jdk/blob/d0987513665def1b6b2981ab5932b6f1b8b310d8/src/hotspot/share/opto/castnode.cpp#L248-L252 >> >> As a result, we replace `276 CastII` with a constant zero in IGVN. But now we lost the pin to the uncommon path of the zero check of `division 2` for `275 SubI` and `193 DivI`. `193 DivI` is only used on the uncommon path but can now float around again, also inside the loop itself, which happens in the testcase. Inside the loop, we execute the division with the now optimized divisor `0 - q = 0` which is a division by zero and we crash. >> >> In summary, it's not a problem that a `Div` node floats above its zero check here but rather that we optimize an input node used as divisor by assuming that we only execute the division on the uncommon path when the zero check of `division 2` failed (which never happens). This divisor optimization would be wrong when the division is executed inside the loop. But due to losing the pin, we end up doing exactly that which results in a SIGFPE crash. >> >> The suggested fix is to extend the sinking algorithm to rewire data nodes with a control input inside a loop whose `get_ctrl()` is actually completely outside loops on uncommon paths. The control input is set to `get_ctrl()` to force the nodes out of loops. In the example above, the control input of `193 DivI` is set to `203 IfFalse`, ensuring that it is still pinned to the uncommon path after `276 CastII` is removed. This fix is also beneficial if we do not sink any nodes at all later. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo Thanks Roland for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5651 From chagedorn at openjdk.java.net Mon Sep 27 14:09:22 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Mon, 27 Sep 2021 14:09:22 GMT Subject: RFR: 8272562: C2: assert(false) failed: Bad graph detected in build_loop_late [v2] In-Reply-To: References: Message-ID: <84wIJvaOk9EDcs9SFonnCp2V_rowJXPuH1HklSPwfTI=.a7901a1e-ab46-4f95-881f-18ba5b46d454@github.com> On Mon, 27 Sep 2021 13:57:56 GMT, Roland Westrelin wrote: >> A counted loop has an array access: >> >> (LoadI (AddP (AddP (ConvI2L (CastII (AddI (Phi ...))))))) >> >> that is >> >> array[iv - 1] >> >> The Phi is the iv Phi. The ConvI2L/CastII are from a range check >> and capture type: 0..maxint-1 The loop is unrolled once, the >> LoadI is cloned. >> >> array[iv - 1] >> array[iv] >> >> The first LoadI is only used out of loop and is sunk with 2 >> clones. One of the clones is on the IfFalse branch of a test >> for iv != 0. >> >> (LoadI (AddP (AddP (ConvI2L (CastII (AddI (CastII ...))))))) >> >> The second CastII pins the nodes out of the loop. The ConvI2L and >> CastII are pushed thru the AddI (for -1). As a result the ConvI2L >> has type: >> 1..maxint-1 >> >> The CastII, because it has the same input as the input for iv != 0, >> becomes 0 which is not part of 1..maxint-1. The ConvI2L and all its >> uses including the LoadI become top. The use of the