From kvn at openjdk.org Thu Sep 1 00:04:10 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Sep 2022 00:04:10 GMT Subject: RFR: 8289943: Simplify some object allocation merges [v6] In-Reply-To: References: Message-ID: On Tue, 30 Aug 2022 23:19:21 GMT, Cesar Soares wrote: >> Hi there, can I please get some feedback on this approach to simplify object allocation merges in order to promote Scalar Replacement of the objects involved in the merge? >> >> The basic idea for this [approach was discussed in this thread](https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2022-April/055189.html) and it consists of: >> 1) Identify Phi nodes that merge object allocations and replace them with a new IR node called ReducedAllocationMergeNode (RAM node). >> 2) Scalar Replace the incoming allocations to the RAM node. >> 3) Scalar Replace the RAM node itself. >> >> There are a few conditions for doing the replacement of the Phi by a RAM node though - Although I plan to work on removing them in subsequent PRs: >> >> - The only supported users of the original Phi are AddP->Load, SafePoints/Traps, DecodeN. >> >> These are the critical parts of the implementation and I'd appreciate it very much if you could tell me if what I implemented isn't violating any C2 IR constraints: >> >> - The way I identify/use the memory edges that will be used to find the last stored values to the merged object fields. >> - The way I check if there is an incoming Allocate node to the original Phi node. >> - The way I check if there is no store to the merged objects after they are merged. >> >> Testing: >> - Windows/Linux/MAC fastdebug/release >> - hotspot_all >> - tier1 >> - Renaissance >> - dacapo >> - new IR-based tests > > Cesar Soares has updated the pull request incrementally with one additional commit since the last revision: > > fix 32 bit execution. Allocations in `testPollutedPolymorphic()` are removed because both classes have the same `Shape` class which have all fields. Would be interesting if `l` field is declared only in both subclasses. ------------- PR: https://git.openjdk.org/jdk/pull/9073 From duke at openjdk.org Thu Sep 1 01:09:23 2022 From: duke at openjdk.org (Dingli Zhang) Date: Thu, 1 Sep 2022 01:09:23 GMT Subject: Integrated: 8293011: riscv: Duplicated stubs to interpreter for static calls In-Reply-To: <_F5Tz-gWFedB3nbhHZyM5tetfpAbNpfLQKKeqjU1GgA=.6014ec86-527b-4b0f-9615-18b09bd12afc@github.com> References: <_F5Tz-gWFedB3nbhHZyM5tetfpAbNpfLQKKeqjU1GgA=.6014ec86-527b-4b0f-9615-18b09bd12afc@github.com> Message-ID: On Mon, 29 Aug 2022 01:32:26 GMT, Dingli Zhang wrote: > Follow up [JDK-8280481](https://bugs.openjdk.org/browse/JDK-8280481). > Calls of Java methods have stubs to the interpreter for the cases when an invoked Java method is not compiled. Calls of static Java methods and final Java methods have statically bound information about a callee during compilation. C1 and C2 always generate a new stub for each call. As the generated stubs for calls of the same method are the same, they can be shared. > > ## Testing: > > - hotspot/jdk tier1 on unmatched board > - hotspot/jtreg/compiler/sharedstubs/SharedStubToInterpTest.java on qemu > > > ## Results > #### Results from [Renaissance 0.14.0](https://github.com/renaissance-benchmarks/renaissance/releases/tag/v0.14.0) > Note: 'Nmethods with shared stubs' is the total number of nmethods counted during benchmark's run. 'Final # of nmethods' is a number of nmethods in CodeCache when JVM exited. > > - riscv64 > > +------------------+-------------+----------------------------+---------------------+ > | Benchmark | Saved bytes | Nmethods with shared stubs | Final # of nmethods | > +------------------+-------------+----------------------------+---------------------+ > | dotty | 1099488 | 4483 | 12447 | > | dec-tree | 511296 | 2310 | 18583 | > | naive-bayes | 358128 | 1714 | 8677 | > | log-regression | 365136 | 1662 | 14626 | > | als | 444576 | 2107 | 8464 | > | finagle-chirper | 265584 | 1558 | 11003 | > | movie-lens | 397776 | 2106 | 6842 | > | finagle-http | 160656 | 974 | 7243 | > | page-rank | 246672 | 1261 | 10293 | > | chi-square | 196080 | 992 | 8841 | > | akka-uct | 138672 | 595 | 4564 | > | reactors | 57552 | 328 | 2338 | > | scala-stm-bench7 | 42122 | 254 | 2261 | > | philosophers | 45744 | 241 | 1945 | > | scala-doku | 48624 | 213 | 794 | > | rx-scrabble | 46128 | 271 | 1945 | > | future-genetic | 37248 | 234 | 1818 | > | scrabble | 30384 | 170 | 1628 | > | par-mnemonics | 28176 | 137 | 1317 | > +------------------+-------------+----------------------------+---------------------+ This pull request has now been integrated. Changeset: 17283cfe Author: Dingli Zhang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/17283cfe4c697e2118f19992a6e87dbee268061e Stats: 51 lines in 4 files changed: 38 ins; 5 del; 8 mod 8293011: riscv: Duplicated stubs to interpreter for static calls Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/10057 From dlong at openjdk.org Thu Sep 1 02:20:08 2022 From: dlong at openjdk.org (Dean Long) Date: Thu, 1 Sep 2022 02:20:08 GMT Subject: RFR: 8292584: assert(cb != __null) failed: must be with -XX:-Inline [v5] In-Reply-To: References: Message-ID: On Fri, 26 Aug 2022 07:01:57 GMT, Dean Long wrote: >> generate_Continuation_doYield_entry() creates an interpreter entry point, but jumps to a compiled stub, which needs to be walkable. The interpreter entry does not create an interpreter frame, so frame walking expects a compiled frame. Normally everything is OK, but if C1 does not inline the intrinsic and we get to the interpreter entry through the c2i adapter, then things can break if the c2i adapter padded the stack because of alignment. The easiest fix is to undo what the c2i adapter might have done. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > fix build failures Thanks Ron. Loom testing looks good so far. ------------- PR: https://git.openjdk.org/jdk/pull/9974 From dlong at openjdk.org Thu Sep 1 03:02:50 2022 From: dlong at openjdk.org (Dean Long) Date: Thu, 1 Sep 2022 03:02:50 GMT Subject: RFR: 8292584: assert(cb != __null) failed: must be with -XX:-Inline [v6] In-Reply-To: References: Message-ID: > generate_Continuation_doYield_entry() creates an interpreter entry point, but jumps to a compiled stub, which needs to be walkable. The interpreter entry does not create an interpreter frame, so frame walking expects a compiled frame. Normally everything is OK, but if C1 does not inline the intrinsic and we get to the interpreter entry through the c2i adapter, then things can break if the c2i adapter padded the stack because of alignment. The easiest fix is to undo what the c2i adapter might have done. Dean Long has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - merge fix - Merge master - fix build failures - fix zero build - cleanup - fix failed assert - version 2, make doYield a native intrinsic like enterSpecial - fix generate_Continuation_doYield_entry ------------- Changes: https://git.openjdk.org/jdk/pull/9974/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9974&range=05 Stats: 586 lines in 48 files changed: 194 ins; 340 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/9974.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9974/head:pull/9974 PR: https://git.openjdk.org/jdk/pull/9974 From roland at openjdk.org Thu Sep 1 13:29:12 2022 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 1 Sep 2022 13:29:12 GMT Subject: RFR: 8292301: [REDO v2] C2 crash when allocating array of size too large In-Reply-To: References: Message-ID: On Wed, 31 Aug 2022 08:07:06 GMT, Xin Liu wrote: > the two bugfixes work for me. LGTM. I am not a reviewer. need other reviewers to approve it. Thanks for reviewing this. > src/hotspot/share/opto/phaseX.cpp line 1855: > >> 1853: // Same if true if the type of a ValidLengthTest input to an AllocateArrayNode changes >> 1854: void PhaseCCP::push_catch(Unique_Node_List& worklist, const Node* use) { >> 1855: if (use->is_Call() || use->is_AllocateArray()) { > > hi, Roland, > I understand your intention here, but isn't AllocateArrayNode also a CallNode? > My understanding is that use->is_Call() is true if use is an AllocationNode. Good catch. There's no issue with CCP then. I'll update the change. ------------- PR: https://git.openjdk.org/jdk/pull/10038 From tholenstein at openjdk.org Thu Sep 1 15:05:38 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 1 Sep 2022 15:05:38 GMT Subject: RFR: JDK-8291805: IGV: Improve Zooming [v6] In-Reply-To: References: Message-ID: > # Overview > > The zooming is improved in the following ways: > > 1) Added a minimum (10%) and maximum (400%) zoom level. If you have a sensitive mouse wheel, it can be annoying to zoom in or out too much (until the graph is invisibly small or the nodes are larger than the window) > > 2) Zooming with a trackpad was not very smooth because IGV did panning and zooming at the same time - Now panning is disabled when CMD/Ctrl key is pressed for zooming > > 3) When only a few nodes were selected, zooming was no longer mouse centred. Instead, the center of the zooming was in the upper left corner. Now the zooming is centred to the middle of the scene when all selected nodes fit in the screen. > > 4) Added a shortcut (Ctrl - 0) to reset the zoom level to 100%. > > 5) Updated the Zoom icons to be vector graphics (.svg) > > # Implementation > > 1) New functions `getZoomMinFactor()` and `getZoomMinFactor()` assure that we do not zoom in or out our infinitely. `getZoomMinFactor()` assures that we do not zoom out further if zoom level is <100% and all visible nodes already fit on the screen. > > 2) We introduced a new `MouseCenteredZoomAction.java` for zooming with the mouse/trackpad. `MouseCenteredZoomAction` performs panning when the modifier key is pressed (Ctrl/CMD) and zooming otherwise. The functions `zoomIn ` and `zoomOut` now do animated zooming using `CustomZoomAnimator`. `CustomZoomAnimator` uses the mouse location as the centre of the zoom animation. > > 3) The `JScrollPane` now has a `JPanel centeringPanel` with `GridBagLayout()` that contains the `viewComponent`. This assures that the `viewComponent` is always centred when no scrollbars are visible. This makes the `Widget topLeft, bottomRight` obsolete as we can now add a white border of `BORDER_SIZE` to the `DiagramScene` instead. > > 4) `ZoomResetAction.java` resets the zoom level to 100%. The shortcut is `Ctrl - 0` and the action is available in the menu: `View` -> `Reset Zoom`. It was not added to the icon menu bar in the `EditorTopComponent` because of space issue. > > 5) new self created icons with vector graphics: `zoomIn.svg`, `zoomOut.svg` and `zoomReset.svg` Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: reduced jumping glitch in zooming smoothing 2 animateZoomFactor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10026/files - new: https://git.openjdk.org/jdk/pull/10026/files/ef3348ff..cb4542d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10026&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10026&range=04-05 Stats: 44 lines in 2 files changed: 13 ins; 8 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/10026.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10026/head:pull/10026 PR: https://git.openjdk.org/jdk/pull/10026 From kvn at openjdk.org Thu Sep 1 19:26:07 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Sep 2022 19:26:07 GMT Subject: RFR: 8290529: C2: assert(BoolTest(btest).is_canonical()) failure [v4] In-Reply-To: References: Message-ID: On Wed, 31 Aug 2022 14:37:09 GMT, Roland Westrelin wrote: >> For the test case: >> >> 1) In Parse::do_if(), tst0 is: >> >> (Bool#lt (CmpU 0 Parm0)) >> >> 2) transformed by gvn in tst: >> >> (Bool#gt (CmpU Parm0 0)) >> >> 3) That test is not canonical and is negated and retransformed which >> results in: >> >> (Bool#eq (CmpI Parm0 0)) >> >> The assert fires because that test is not canonical either. >> >> The root cause I think is that the (CmpU .. 0) -> (CmpI .. 0) only >> triggers if the condition of the CmpU is canonical (and results in a >> non canonical test). Tweaking it so it applies even if the condition >> is not leads to the following change in the steps above: >> >> 2) (Bool#ne (CmpI Parm0 0)) >> >> which is a canonical test. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > comment Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9553 From kvn at openjdk.org Thu Sep 1 20:21:29 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Sep 2022 20:21:29 GMT Subject: RFR: 8292584: assert(cb != __null) failed: must be with -XX:-Inline [v6] In-Reply-To: References: Message-ID: On Thu, 1 Sep 2022 03:02:50 GMT, Dean Long wrote: >> generate_Continuation_doYield_entry() creates an interpreter entry point, but jumps to a compiled stub, which needs to be walkable. The interpreter entry does not create an interpreter frame, so frame walking expects a compiled frame. Normally everything is OK, but if C1 does not inline the intrinsic and we get to the interpreter entry through the c2i adapter, then things can break if the c2i adapter padded the stack because of alignment. The easiest fix is to undo what the c2i adapter might have done. > > Dean Long has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - merge fix > - Merge master > - fix build failures > - fix zero build > - cleanup > - fix failed assert > - version 2, make doYield a native intrinsic like enterSpecial > - fix generate_Continuation_doYield_entry Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9974 From dlong at openjdk.org Thu Sep 1 20:21:29 2022 From: dlong at openjdk.org (Dean Long) Date: Thu, 1 Sep 2022 20:21:29 GMT Subject: RFR: 8292584: assert(cb != __null) failed: must be with -XX:-Inline [v6] In-Reply-To: References: Message-ID: On Thu, 1 Sep 2022 03:02:50 GMT, Dean Long wrote: >> generate_Continuation_doYield_entry() creates an interpreter entry point, but jumps to a compiled stub, which needs to be walkable. The interpreter entry does not create an interpreter frame, so frame walking expects a compiled frame. Normally everything is OK, but if C1 does not inline the intrinsic and we get to the interpreter entry through the c2i adapter, then things can break if the c2i adapter padded the stack because of alignment. The easiest fix is to undo what the c2i adapter might have done. > > Dean Long has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - merge fix > - Merge master > - fix build failures > - fix zero build > - cleanup > - fix failed assert > - version 2, make doYield a native intrinsic like enterSpecial > - fix generate_Continuation_doYield_entry Thanks Vladimir. ------------- PR: https://git.openjdk.org/jdk/pull/9974 From dlong at openjdk.org Thu Sep 1 20:22:48 2022 From: dlong at openjdk.org (Dean Long) Date: Thu, 1 Sep 2022 20:22:48 GMT Subject: Integrated: 8292584: assert(cb != __null) failed: must be with -XX:-Inline In-Reply-To: References: Message-ID: On Tue, 23 Aug 2022 06:53:36 GMT, Dean Long wrote: > generate_Continuation_doYield_entry() creates an interpreter entry point, but jumps to a compiled stub, which needs to be walkable. The interpreter entry does not create an interpreter frame, so frame walking expects a compiled frame. Normally everything is OK, but if C1 does not inline the intrinsic and we get to the interpreter entry through the c2i adapter, then things can break if the c2i adapter padded the stack because of alignment. The easiest fix is to undo what the c2i adapter might have done. This pull request has now been integrated. Changeset: fa68371b Author: Dean Long URL: https://git.openjdk.org/jdk/commit/fa68371bb816d797da02e51187955044f835d402 Stats: 586 lines in 48 files changed: 194 ins; 340 del; 52 mod 8292584: assert(cb != __null) failed: must be with -XX:-Inline Reviewed-by: kvn, rpressler ------------- PR: https://git.openjdk.org/jdk/pull/9974 From duke at openjdk.org Fri Sep 2 00:36:54 2022 From: duke at openjdk.org (Cesar Soares) Date: Fri, 2 Sep 2022 00:36:54 GMT Subject: RFR: 8289943: Simplify some object allocation merges [v6] In-Reply-To: References: Message-ID: <5FwD40cG1de1mCNghB1LJNO3C0DVjo178qR5KiCjxaM=.13123849-3a8e-419f-b10e-406aa9fe2b40@github.com> On Thu, 1 Sep 2022 00:01:59 GMT, Vladimir Kozlov wrote: >> Cesar Soares has updated the pull request incrementally with one additional commit since the last revision: >> >> fix 32 bit execution. > > Allocations in `testPollutedPolymorphic()` are removed because both classes have the same `Shape` class which have all fields. Would be interesting if `l` field is declared only in both subclasses. @vnkozlov - Thank you for clarifying that. I've been playing with lifting the restriction and I actually found a corner case: public static Class test(boolean c1, boolean c2, boolean c3, int x, int y, int w, int z) { Animal s = new Dog(x, y, z); if (c1) { s = new Cat("Fisker"); } Unloaded u = new Unloaded(); // assumes this is converted to a uncommon_trap(unloaded, reinterpret) return s.getClass(); } It seems that when merging allocations of different subtypes I'll need to add a special `Phi` node merging the `Klass` of the input allocations. If I don't do that, the method above will return Animal.class instead of `Dog.class` or `Cat.class`. I'm wondering if I'll actually have to do the same for the Header/Mark word of the input allocations. ------------- PR: https://git.openjdk.org/jdk/pull/9073 From dlong at openjdk.org Fri Sep 2 01:35:19 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 2 Sep 2022 01:35:19 GMT Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out Message-ID: The problem is caused by missing bailout logic in inline_string_char_access(). This PR adds the needed logic to match other intrinsics. I tried to come up with a stand-alone test case, but was not successful. ------------- Commit messages: - add standard bailout logic to inline_string_char_access Changes: https://git.openjdk.org/jdk/pull/10136/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10136&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8292385 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10136.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10136/head:pull/10136 PR: https://git.openjdk.org/jdk/pull/10136 From jiefu at openjdk.org Fri Sep 2 01:46:40 2022 From: jiefu at openjdk.org (Jie Fu) Date: Fri, 2 Sep 2022 01:46:40 GMT Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 01:27:57 GMT, Dean Long wrote: > The problem is caused by missing bailout logic in inline_string_char_access(). This PR adds the needed logic to match other intrinsics. > > I tried to come up with a stand-alone test case, but was not successful. Is it possible to create a jtreg test for this fix? Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10136 From dean.long at oracle.com Fri Sep 2 02:02:00 2022 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 1 Sep 2022 19:02:00 -0700 Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: On 9/1/22 6:46 PM, Jie Fu wrote: > On Fri, 2 Sep 2022 01:27:57 GMT, Dean Long wrote: > >> The problem is caused by missing bailout logic in inline_string_char_access(). This PR adds the needed logic to match other intrinsics. >> >> I tried to come up with a stand-alone test case, but was not successful. > > Is it possible to create a jtreg test for this fix? It is probably possible, but when I tried I couldn't get it to reproduce. It depends on profiling information, so I had to use the replay file to reproduce it. dl > Thanks. > > ------------- > > PR: https://git.openjdk.org/jdk/pull/10136 From dlong at openjdk.org Fri Sep 2 02:07:41 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 2 Sep 2022 02:07:41 GMT Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 01:42:58 GMT, Jie Fu wrote: > Is it possible to create a jtreg test for this fix? Thanks. It is probably possible, but when I tried I couldn't get it to reproduce. It depends on profiling information, so I had to use the replay file to reproduce it. ------------- PR: https://git.openjdk.org/jdk/pull/10136 From njian at openjdk.org Fri Sep 2 02:18:55 2022 From: njian at openjdk.org (Ningsheng Jian) Date: Fri, 2 Sep 2022 02:18:55 GMT Subject: RFR: 8288012: AArch64: unnecessary macro expansion in stubGenerator_aarch64 In-Reply-To: References: Message-ID: On Mon, 29 Aug 2022 01:46:56 GMT, Hao Sun wrote: > We use utility routines to replace macros in generate_md5_implCompress > and generate_sha512_implCompress() functions, as these macro expanisons > would bloat the generator. > > Minor update: "Label keys" is removed since it's dead code. > > In my local "release + server" build on AArch64 machine, the size of > stubGenerator_aarch64.o is reduced about 9% (from 6.73 MB to 6.13 MB). > > Testings: > Tier1~3 passed on sha512 feature supporting machine. > We evaluted JMH test case MessageDigests.java and didn't see viable > performance change with and without this patch. Marked as reviewed by njian (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/10058 From haosun at openjdk.org Fri Sep 2 02:27:38 2022 From: haosun at openjdk.org (Hao Sun) Date: Fri, 2 Sep 2022 02:27:38 GMT Subject: RFR: 8288012: AArch64: unnecessary macro expansion in stubGenerator_aarch64 In-Reply-To: References: Message-ID: On Mon, 29 Aug 2022 01:46:56 GMT, Hao Sun wrote: > We use utility routines to replace macros in generate_md5_implCompress > and generate_sha512_implCompress() functions, as these macro expanisons > would bloat the generator. > > Minor update: "Label keys" is removed since it's dead code. > > In my local "release + server" build on AArch64 machine, the size of > stubGenerator_aarch64.o is reduced about 9% (from 6.73 MB to 6.13 MB). > > Testings: > Tier1~3 passed on sha512 feature supporting machine. > We evaluted JMH test case MessageDigests.java and didn't see viable > performance change with and without this patch. Thanks for your review. I think the GHA test failure is not related to this patch. ------------- PR: https://git.openjdk.org/jdk/pull/10058 From jiefu at openjdk.org Fri Sep 2 02:30:41 2022 From: jiefu at openjdk.org (Jie Fu) Date: Fri, 2 Sep 2022 02:30:41 GMT Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 02:03:38 GMT, Dean Long wrote: > > Is it possible to create a jtreg test for this fix? Thanks. > > It is probably possible, but when I tried I couldn't get it to reproduce. It depends on profiling information, so I had to use the replay file to reproduce it. So can you share us the replay file to verify the fix? ------------- PR: https://git.openjdk.org/jdk/pull/10136 From haosun at openjdk.org Fri Sep 2 02:46:50 2022 From: haosun at openjdk.org (Hao Sun) Date: Fri, 2 Sep 2022 02:46:50 GMT Subject: Integrated: 8288012: AArch64: unnecessary macro expansion in stubGenerator_aarch64 In-Reply-To: References: Message-ID: <8xjCRkmWyWh7FTbpws3kehlnpwwiD59TJSa5WVntnGA=.aaa1b5c0-6bd9-4ac7-b1ce-121b5aa4d1d9@github.com> On Mon, 29 Aug 2022 01:46:56 GMT, Hao Sun wrote: > We use utility routines to replace macros in generate_md5_implCompress > and generate_sha512_implCompress() functions, as these macro expanisons > would bloat the generator. > > Minor update: "Label keys" is removed since it's dead code. > > In my local "release + server" build on AArch64 machine, the size of > stubGenerator_aarch64.o is reduced about 9% (from 6.73 MB to 6.13 MB). > > Testings: > Tier1~3 passed on sha512 feature supporting machine. > We evaluted JMH test case MessageDigests.java and didn't see viable > performance change with and without this patch. This pull request has now been integrated. Changeset: e0168a0e Author: Hao Sun Committer: Ningsheng Jian URL: https://git.openjdk.org/jdk/commit/e0168a0eb0ce23fda77e65cea9dff7eae0512309 Stats: 272 lines in 1 file changed: 96 ins; 72 del; 104 mod 8288012: AArch64: unnecessary macro expansion in stubGenerator_aarch64 Reviewed-by: aph, njian ------------- PR: https://git.openjdk.org/jdk/pull/10058 From xgong at openjdk.org Fri Sep 2 03:07:51 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 2 Sep 2022 03:07:51 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v6] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> Message-ID: On Wed, 31 Aug 2022 06:10:07 GMT, Xiaohong Gong wrote: >> Recently we found the performance of "`FIRST_NONZERO`" for double type is largely worse than the other types on x86 when `UseAVX=2`. The main reason is the "`VectorCastL2X`" op is not supported by the backend when the dst element type is `T_DOUBLE`. This makes the check of `VectorCast` op fail before intrinsifying "`VectorMask.cast()`" which is used in the >> "`FIRST_NONZERO`" java implementation (see [1]). However, the compiler will not generate the `VectorCast `op for `VectorMask.cast()` if: >> >> 1) the current platform supports the predicated feature >> 2) the element size (in bytes) of the src and dst type is the same >> >> So the check of "`VectorCast`" op is needless for such cases. To fix it, this patch: >> >> 1) limits the specified vector cast op check to vectors >> 2) adds the relative mask cast op check for VectorMask.cast() >> 3) cleans up the unnecessary codes >> >> Here is the performance of "`FIRST_NONZERO`" benchmark [2] on a x86 machine with `UseAVX=2`: >> >> Benchmark (size) Mode Cnt Before After Units >> DoubleMaxVector.FIRST_NONZERO 1024 thrpt 15 49.266 2460.886 ops/ms >> DoubleMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 49.554 1892.223 ops/ms >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java#L770 >> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L246 > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Hi @jatin-bhateja, could you please help to take a look at this PR? Thanks so much for your time! ------------- PR: https://git.openjdk.org/jdk/pull/9737 From thartmann at openjdk.org Fri Sep 2 05:57:38 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 2 Sep 2022 05:57:38 GMT Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 01:27:57 GMT, Dean Long wrote: > The problem is caused by missing bailout logic in inline_string_char_access(). This PR adds the needed logic to match other intrinsics. > > I tried to come up with a stand-alone test case, but was not successful. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/10136 From thartmann at openjdk.org Fri Sep 2 06:02:56 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 2 Sep 2022 06:02:56 GMT Subject: RFR: 8290529: C2: assert(BoolTest(btest).is_canonical()) failure [v4] In-Reply-To: References: Message-ID: <0JW6PwBbaRxHAWle5Nfiv1H10u61IwDAYHt6hOR8PoU=.b2f157e9-965f-41f5-8f63-0e7ea03ad303@github.com> On Wed, 31 Aug 2022 14:37:09 GMT, Roland Westrelin wrote: >> For the test case: >> >> 1) In Parse::do_if(), tst0 is: >> >> (Bool#lt (CmpU 0 Parm0)) >> >> 2) transformed by gvn in tst: >> >> (Bool#gt (CmpU Parm0 0)) >> >> 3) That test is not canonical and is negated and retransformed which >> results in: >> >> (Bool#eq (CmpI Parm0 0)) >> >> The assert fires because that test is not canonical either. >> >> The root cause I think is that the (CmpU .. 0) -> (CmpI .. 0) only >> triggers if the condition of the CmpU is canonical (and results in a >> non canonical test). Tweaking it so it applies even if the condition >> is not leads to the following change in the steps above: >> >> 2) (Bool#ne (CmpI Parm0 0)) >> >> which is a canonical test. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > comment Looks good to me. So the root cause is JDK-8276162, correct? ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/9553 From pli at openjdk.org Fri Sep 2 06:11:58 2022 From: pli at openjdk.org (Pengfei Li) Date: Fri, 2 Sep 2022 06:11:58 GMT Subject: RFR: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv [v3] In-Reply-To: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> Message-ID: > This is a REDO of JDK-8289996. In previous patch, we defer some strength > reductions in Ideal functions of `Mul[I|L]Node` to post loop igvn phase > to fix a range check hoisting issue. More about previous patch can be > found in PR #9508, where we have described some details of the issue > we would like to fix. > > Previous patch was backed out due to some jtreg failures found. We have > analyzed those failures one by one and found one of them exposes a real > performance regression. We see that deferring some strength reductions > to post loop igvn phase has too much impact. Some vector multiplication > will not be optimized to vector addition with vector shift after that > change. So in this REDO we propose the range check hoisting fix with a > different approach. > > In this new patch, we add some recursive pattern matches for scaled loop > iv in function `PhaseIdealLoop::is_scaled_iv()`. These include matching > a sum or a difference of two scaled iv expressions. With this, all kinds > of Ideal-transformed scaled iv expressions can still be recognized. This > new approach only touches loop transformation code and hence has much > smaller impact. We have verified that this new approach applies to both > int range checks and long range checks. > > Previously attached jtreg case fails on ppc64 because VectorAPI has no > vector intrinsics on ppc64 so there's no long range check to hoist. In > this patch, we limit the test architecture to x64 and AArch64. > > Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1. Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: Update p_short_scale compuation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9851/files - new: https://git.openjdk.org/jdk/pull/9851/files/a2aaed72..02402795 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9851&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9851&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9851.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9851/head:pull/9851 PR: https://git.openjdk.org/jdk/pull/9851 From pli at openjdk.org Fri Sep 2 06:11:58 2022 From: pli at openjdk.org (Pengfei Li) Date: Fri, 2 Sep 2022 06:11:58 GMT Subject: RFR: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv [v2] In-Reply-To: References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> <-Mby8uDHQSAUHhZBKhd_LBQXuOC1qkr3nd4-YcViTjo=.5ecc66b2-45f3-4614-a82f-2030f0842d77@github.com> Message-ID: On Wed, 31 Aug 2022 07:43:49 GMT, Roland Westrelin wrote: >>> I think it should be short_scale_l && short_scale_r here >> >> Hi @rwestrel , may I ask you a question about this? From your comments, I see `short_scale` reports if a `ConvI2L` node is present since it's used to protect against overflow. Does this mean that `ConvI2L` at this point only appears in long counted loops? I ask this because in my knowledge array address computing in int loops also generates `ConvI2L` on 64-bit platforms. > > It's for loops like this one: > public static void testStridePosScalePosInIntLoop1(int start, int stop, long length, long offset) { > final long scale = 2; > final int stride = 1; > > // Same but with int loop > for (int i = start; i < stop; i += stride) { > Objects.checkIndex(scale * i + offset, length); > } > } > > It's an int loop but because length is a long, there's an implicit cast of scale * i + offset to long (which is where the ConvI2L comes from). In the case of your change an expression for the range check that would need to be optimized would be: > ((long)i) * scale > with scale 5 for instance so expressed by the compiler as ((long)i) << 2 + ((long)i) << 1 > and both calls to is_scalled_iv would return true for short_scale which is why I think it should short_scale_l && short_scale_r > You're right that address computation includes a ConvI2L on 64 bits but the range check doesn't in: > array[i] = val; Thanks for your detailed explanation! I have updated these. ------------- PR: https://git.openjdk.org/jdk/pull/9851 From rcastanedalo at openjdk.org Fri Sep 2 06:36:50 2022 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 2 Sep 2022 06:36:50 GMT Subject: RFR: 8292660: C2: blocks made unreachable by NeverBranch-to-Goto conversion are removed incorrectly [v3] In-Reply-To: <1rtbcl8YVPhYz8cJ3RmnhUrv4Tl8tFfQ-R1ZdkbhiaA=.aa9cf2fb-1e13-4d0e-bb9f-c48d9f5ed6db@github.com> References: <1rtbcl8YVPhYz8cJ3RmnhUrv4Tl8tFfQ-R1ZdkbhiaA=.aa9cf2fb-1e13-4d0e-bb9f-c48d9f5ed6db@github.com> Message-ID: > This changeset addresses three issues in the current removal of unreachable blocks after NeverBranch-to-goto conversion (introduced recently by [JDK-8292285](https://bugs.openjdk.org/browse/JDK-8292285)): > > 1. The [unreachable block removal and pre-order index update loop](https://github.com/openjdk/jdk/blob/7b5f9edb59ef763acca80724ca37f3624d720d06/src/hotspot/share/opto/block.cpp#L613-L621) skips the block next to the removed one, and iterates beyond the end of the block list (`PhaseCFG::_blocks`). Skipping blocks can lead to duplicate pre-order indices (`Block::_pre_order`) and/or pre-order indices greater than the size of the block list, causing problems in later transformations. > > 2. The [outer block traversal loop](https://github.com/openjdk/jdk/blob/7b5f9edb59ef763acca80724ca37f3624d720d06/src/hotspot/share/opto/block.cpp#L698-L729) iterates beyond the end of the block list whenever one or more unreachable blocks are removed. > > 3. Transitively unreachable blocks (such as B10 in the following example), arising in methods with multiple infinite loops, are not removed: > > ![transitive](https://user-images.githubusercontent.com/8792647/186109043-416213b7-8735-41de-9910-acf0997db095.png) > > This changeset addresses issue 1 by decrementing the block count (`_number_of_blocks`) and block index (`i`) right after a block is removed from the block list, and issues 2 and 3 by decoupling NeverBranch-to-goto conversion from removal of unreachable code. Instead of removing the blocks eagerly, the removal is postponed to a later phase that works in an iterative worklist fashion, making it possible to remove transitively unreachable blocks such as B10 in the above example. > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode). > - tier4-7 (linux-x64; debug mode). > - fuzzing (~1 h. on each platform). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Iterate only over blocks with larger _pre_order than 'dead' ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9976/files - new: https://git.openjdk.org/jdk/pull/9976/files/7a95e6fc..93e600cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9976&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9976&range=01-02 Stats: 20 lines in 1 file changed: 3 ins; 9 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/9976.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9976/head:pull/9976 PR: https://git.openjdk.org/jdk/pull/9976 From rcastanedalo at openjdk.org Fri Sep 2 06:36:52 2022 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 2 Sep 2022 06:36:52 GMT Subject: RFR: 8292660: C2: blocks made unreachable by NeverBranch-to-Goto conversion are removed incorrectly [v2] In-Reply-To: References: <1rtbcl8YVPhYz8cJ3RmnhUrv4Tl8tFfQ-R1ZdkbhiaA=.aa9cf2fb-1e13-4d0e-bb9f-c48d9f5ed6db@github.com> Message-ID: <6blG_0ORUyRMYh-LWTX3_J3I1gndcIZzuLCHjyUZb7I=.8d0465ed-1472-472f-8733-f695df10ed0d@github.com> On Wed, 31 Aug 2022 07:12:39 GMT, Roland Westrelin wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Postpone unreachable block removal to after PhaseCFG::fixup_flow() > > src/hotspot/share/opto/block.cpp line 961: > >> 959: while (unreachable.size() > 0) { >> 960: Block* dead = unreachable.pop(); >> 961: for (uint i = 0; i < _number_of_blocks; i++) { > > Wouldn't we want to iterate backward here and adjust block->_pre_order until we hit dead? Thanks for the suggestion, Roland! Yes, since `get_block(i)->_pre_order == i` holds at this stage, it is enough with iterating over the elements that succeed `dead` in the block list. I have updated the changeset accordingly. ------------- PR: https://git.openjdk.org/jdk/pull/9976 From roland at openjdk.org Fri Sep 2 06:43:50 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 2 Sep 2022 06:43:50 GMT Subject: RFR: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv [v3] In-Reply-To: References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> Message-ID: On Fri, 2 Sep 2022 06:11:58 GMT, Pengfei Li wrote: >> This is a REDO of JDK-8289996. In previous patch, we defer some strength >> reductions in Ideal functions of `Mul[I|L]Node` to post loop igvn phase >> to fix a range check hoisting issue. More about previous patch can be >> found in PR #9508, where we have described some details of the issue >> we would like to fix. >> >> Previous patch was backed out due to some jtreg failures found. We have >> analyzed those failures one by one and found one of them exposes a real >> performance regression. We see that deferring some strength reductions >> to post loop igvn phase has too much impact. Some vector multiplication >> will not be optimized to vector addition with vector shift after that >> change. So in this REDO we propose the range check hoisting fix with a >> different approach. >> >> In this new patch, we add some recursive pattern matches for scaled loop >> iv in function `PhaseIdealLoop::is_scaled_iv()`. These include matching >> a sum or a difference of two scaled iv expressions. With this, all kinds >> of Ideal-transformed scaled iv expressions can still be recognized. This >> new approach only touches loop transformation code and hence has much >> smaller impact. We have verified that this new approach applies to both >> int range checks and long range checks. >> >> Previously attached jtreg case fails on ppc64 because VectorAPI has no >> vector intrinsics on ppc64 so there's no long range check to hoist. In >> this patch, we limit the test architecture to x64 and AArch64. >> >> Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Update p_short_scale compuation Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.org/jdk/pull/9851 From roland at openjdk.org Fri Sep 2 06:44:06 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 2 Sep 2022 06:44:06 GMT Subject: RFR: 8292660: C2: blocks made unreachable by NeverBranch-to-Goto conversion are removed incorrectly [v3] In-Reply-To: References: <1rtbcl8YVPhYz8cJ3RmnhUrv4Tl8tFfQ-R1ZdkbhiaA=.aa9cf2fb-1e13-4d0e-bb9f-c48d9f5ed6db@github.com> Message-ID: On Fri, 2 Sep 2022 06:36:50 GMT, Roberto Casta?eda Lozano wrote: >> This changeset addresses three issues in the current removal of unreachable blocks after NeverBranch-to-goto conversion (introduced recently by [JDK-8292285](https://bugs.openjdk.org/browse/JDK-8292285)): >> >> 1. The [unreachable block removal and pre-order index update loop](https://github.com/openjdk/jdk/blob/7b5f9edb59ef763acca80724ca37f3624d720d06/src/hotspot/share/opto/block.cpp#L613-L621) skips the block next to the removed one, and iterates beyond the end of the block list (`PhaseCFG::_blocks`). Skipping blocks can lead to duplicate pre-order indices (`Block::_pre_order`) and/or pre-order indices greater than the size of the block list, causing problems in later transformations. >> >> 2. The [outer block traversal loop](https://github.com/openjdk/jdk/blob/7b5f9edb59ef763acca80724ca37f3624d720d06/src/hotspot/share/opto/block.cpp#L698-L729) iterates beyond the end of the block list whenever one or more unreachable blocks are removed. >> >> 3. Transitively unreachable blocks (such as B10 in the following example), arising in methods with multiple infinite loops, are not removed: >> >> ![transitive](https://user-images.githubusercontent.com/8792647/186109043-416213b7-8735-41de-9910-acf0997db095.png) >> >> This changeset addresses issues 2 and 3 by decoupling NeverBranch-to-goto conversion from removal of unreachable code. Instead of removing the blocks eagerly, the removal is postponed to a later phase that works in an iterative worklist fashion, making it possible to remove transitively unreachable blocks such as B10 in the above example. Postponing removal to a later phase (where `get_block(i)->_pre_order == i` holds) also simplifies addressing issue 1: in the changeset, it is sufficient to iterate over the blocks that follow the removed block in the block list to decrement their `_pre_order` index. >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode). >> - tier4-7 (linux-x64; debug mode). >> - fuzzing (~1 h. on each platform). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Iterate only over blocks with larger _pre_order than 'dead' Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.org/jdk/pull/9976 From rcastanedalo at openjdk.org Fri Sep 2 06:44:06 2022 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 2 Sep 2022 06:44:06 GMT Subject: RFR: 8292660: C2: blocks made unreachable by NeverBranch-to-Goto conversion are removed incorrectly [v3] In-Reply-To: References: <1rtbcl8YVPhYz8cJ3RmnhUrv4Tl8tFfQ-R1ZdkbhiaA=.aa9cf2fb-1e13-4d0e-bb9f-c48d9f5ed6db@github.com> Message-ID: On Fri, 2 Sep 2022 06:39:16 GMT, Roland Westrelin wrote: > Looks good to me. Thanks for reviewing, Roland! ------------- PR: https://git.openjdk.org/jdk/pull/9976 From roland at openjdk.org Fri Sep 2 07:01:43 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 2 Sep 2022 07:01:43 GMT Subject: RFR: 8290529: C2: assert(BoolTest(btest).is_canonical()) failure [v4] In-Reply-To: <0JW6PwBbaRxHAWle5Nfiv1H10u61IwDAYHt6hOR8PoU=.b2f157e9-965f-41f5-8f63-0e7ea03ad303@github.com> References: <0JW6PwBbaRxHAWle5Nfiv1H10u61IwDAYHt6hOR8PoU=.b2f157e9-965f-41f5-8f63-0e7ea03ad303@github.com> Message-ID: On Fri, 2 Sep 2022 06:00:16 GMT, Tobias Hartmann wrote: > So the root cause is JDK-8276162, correct? It's the one that is causing it to show up now but JDK-8209544 is the one that introduced the bug. ------------- PR: https://git.openjdk.org/jdk/pull/9553 From roland at openjdk.org Fri Sep 2 07:01:44 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 2 Sep 2022 07:01:44 GMT Subject: RFR: 8290529: C2: assert(BoolTest(btest).is_canonical()) failure [v4] In-Reply-To: References: <4gLGY-GPwaDaWIpV3sVHbdg7H2nIPpwG19toBs57WKM=.94c08a85-76de-4651-80ae-6aece48bb5f2@github.com> <0rkyLlpTnpp4PnTZCSWc1B_kb0NYQgvSKbU6mu-bnx4=.008b6a51-111a-43d8-9638-4c77b54139f3@github.com> Message-ID: On Mon, 29 Aug 2022 18:47:33 GMT, Vladimir Kozlov wrote: >>> Changes are good. If possible add IR framework test. >> >> Thanks for looking a this. Are you asking that the test that I added to catch this specific problem be turned into and IR framework test or are you asking for a separate test to verify the transformation in general? > >> > Changes are good. If possible add IR framework test. >> >> Thanks for looking a this. Are you asking that the test that I added to catch this specific problem be turned into and IR framework test or are you asking for a separate test to verify the transformation in general? > > Separate IR test to verify the transformation in general. @vnkozlov @TobiHartmann thanks for the reviews ------------- PR: https://git.openjdk.org/jdk/pull/9553 From dlong at openjdk.org Fri Sep 2 07:47:57 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 2 Sep 2022 07:47:57 GMT Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 02:27:17 GMT, Jie Fu wrote: > > > Is it possible to create a jtreg test for this fix? Thanks. > > > > > > It is probably possible, but when I tried I couldn't get it to reproduce. It depends on profiling information, so I had to use the replay file to reproduce it. > > So can you share us the replay file to verify the fix? You can find replay_pid37690.log in the attachment for the bug. I started with that and reduced it using JDK-8293287. I will attach that replay file to the bug as well. ------------- PR: https://git.openjdk.org/jdk/pull/10136 From dlong at openjdk.org Fri Sep 2 07:47:57 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 2 Sep 2022 07:47:57 GMT Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 01:27:57 GMT, Dean Long wrote: > The problem is caused by missing bailout logic in inline_string_char_access(). This PR adds the needed logic to match other intrinsics. > > I tried to come up with a stand-alone test case, but was not successful. Thanks Tobias. ------------- PR: https://git.openjdk.org/jdk/pull/10136 From roland at openjdk.org Fri Sep 2 08:34:05 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 2 Sep 2022 08:34:05 GMT Subject: RFR: 8290529: C2: assert(BoolTest(btest).is_canonical()) failure [v5] In-Reply-To: References: Message-ID: > For the test case: > > 1) In Parse::do_if(), tst0 is: > > (Bool#lt (CmpU 0 Parm0)) > > 2) transformed by gvn in tst: > > (Bool#gt (CmpU Parm0 0)) > > 3) That test is not canonical and is negated and retransformed which > results in: > > (Bool#eq (CmpI Parm0 0)) > > The assert fires because that test is not canonical either. > > The root cause I think is that the (CmpU .. 0) -> (CmpI .. 0) only > triggers if the condition of the CmpU is canonical (and results in a > non canonical test). Tweaking it so it applies even if the condition > is not leads to the following change in the steps above: > > 2) (Bool#ne (CmpI Parm0 0)) > > which is a canonical test. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: make CmpUWithZero test x64 only ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9553/files - new: https://git.openjdk.org/jdk/pull/9553/files/8850b36c..a3c5557f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9553&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9553&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9553.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9553/head:pull/9553 PR: https://git.openjdk.org/jdk/pull/9553 From roland at openjdk.org Fri Sep 2 08:34:05 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 2 Sep 2022 08:34:05 GMT Subject: RFR: 8290529: C2: assert(BoolTest(btest).is_canonical()) failure [v4] In-Reply-To: References: Message-ID: On Wed, 31 Aug 2022 14:37:09 GMT, Roland Westrelin wrote: >> For the test case: >> >> 1) In Parse::do_if(), tst0 is: >> >> (Bool#lt (CmpU 0 Parm0)) >> >> 2) transformed by gvn in tst: >> >> (Bool#gt (CmpU Parm0 0)) >> >> 3) That test is not canonical and is negated and retransformed which >> results in: >> >> (Bool#eq (CmpI Parm0 0)) >> >> The assert fires because that test is not canonical either. >> >> The root cause I think is that the (CmpU .. 0) -> (CmpI .. 0) only >> triggers if the condition of the CmpU is canonical (and results in a >> non canonical test). Tweaking it so it applies even if the condition >> is not leads to the following change in the steps above: >> >> 2) (Bool#ne (CmpI Parm0 0)) >> >> which is a canonical test. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > comment For the record: Test CmpUWithZero.java requires the compareUnsigned intrinsic which in turn requires the CmpU3 only implemented on x86_64 so I made that test x86_64 only. ------------- PR: https://git.openjdk.org/jdk/pull/9553 From dlong at openjdk.org Fri Sep 2 08:37:37 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 2 Sep 2022 08:37:37 GMT Subject: RFR: 8293287 add ReplayReduce flag In-Reply-To: References: Message-ID: <2onk8LrCxLP0AuzqZulPi_XeQC-QXgXhd9iVtZWKzUw=.82614ce1-5d62-42d8-9e53-42056509c7ca@github.com> On Fri, 2 Sep 2022 01:20:26 GMT, Dean Long wrote: > Add an experimental flag to help developers "reduce" a replay file. > > As a first step, I plan to simulate reduced inlining. This will output multiple "compile" lines as if the first level of inlining never happened: > A --> B --> C > A --> D --> E > becomes > B --> C > D --> E > Developers can repeat iteratively until the replay crash no longer reproduces. One test is failing -- moving back to DRAFT. ------------- PR: https://git.openjdk.org/jdk/pull/10134 From roland at openjdk.org Fri Sep 2 09:02:54 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 2 Sep 2022 09:02:54 GMT Subject: RFR: 8292301: [REDO v2] C2 crash when allocating array of size too large [v2] In-Reply-To: References: Message-ID: > On top of the redo, this fixed 2 bugs: > > 8288184: the problem here is that the ValidLengthTest input of an > AllocateArrayNode becomes a constant. The CatchNode would then change > types if it was reprocessed but it's not. Custom logic is needed to > enqueue the CatchNode when the ValidLengthTest input of an > AllocateArrayNode changes. The CastII out of the AllocateArrayNode > becomes top but the fallthrough path doesn't die. This happens with > igvn in the case of the bug but could also happen with ccp. I fixed > both in this patch. > > 8291665: the code pattern for this is 2 AllocateArrayNodes out of loop > with a shared ValidLengthTest input in a loop. When the loop is cloned > that causes Phis to be added between the AllocateArrayNodes and the > BoolNode of the ValidLengthTest inputs. Split if runs next and it > doesn't expect the Phi at the ValidLengthTest inputs. The fix here is > to clone the Bool/Cmp subgraph down on loop cloning. There's logic for > that when the use of the bool is an If for instance so I simply added > a special case to run that logic for an AllocateArrayNode use as > well. Note that the test case I added fails reliably on 11 but not > with the current jdk developement branch. AFAICT, the bug is there but > something unrelated changed and a slightly different graph is built > for the test case that prevents split if. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: undo needless change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10038/files - new: https://git.openjdk.org/jdk/pull/10038/files/6c069426..bbf9851d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10038&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10038&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10038.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10038/head:pull/10038 PR: https://git.openjdk.org/jdk/pull/10038 From thartmann at openjdk.org Fri Sep 2 09:49:49 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 2 Sep 2022 09:49:49 GMT Subject: RFR: 8290529: C2: assert(BoolTest(btest).is_canonical()) failure [v5] In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 08:34:05 GMT, Roland Westrelin wrote: >> For the test case: >> >> 1) In Parse::do_if(), tst0 is: >> >> (Bool#lt (CmpU 0 Parm0)) >> >> 2) transformed by gvn in tst: >> >> (Bool#gt (CmpU Parm0 0)) >> >> 3) That test is not canonical and is negated and retransformed which >> results in: >> >> (Bool#eq (CmpI Parm0 0)) >> >> The assert fires because that test is not canonical either. >> >> The root cause I think is that the (CmpU .. 0) -> (CmpI .. 0) only >> triggers if the condition of the CmpU is canonical (and results in a >> non canonical test). Tweaking it so it applies even if the condition >> is not leads to the following change in the steps above: >> >> 2) (Bool#ne (CmpI Parm0 0)) >> >> which is a canonical test. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > make CmpUWithZero test x64 only Thanks for the clarifications. ------------- PR: https://git.openjdk.org/jdk/pull/9553 From tholenstein at openjdk.org Fri Sep 2 10:10:02 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 2 Sep 2022 10:10:02 GMT Subject: RFR: JDK-8291805: IGV: Improve Zooming [v7] In-Reply-To: References: Message-ID: <1UrkczC0q8Re9rGCEILVr6s75HJNe1EdVGl_pL8BMlc=.d49befd9-6c27-4e9b-bd33-32d338e991ab@github.com> > # Overview > > The zooming is improved in the following ways: > > 1) Added a minimum (10%) and maximum (400%) zoom level. If you have a sensitive mouse wheel, it can be annoying to zoom in or out too much (until the graph is invisibly small or the nodes are larger than the window) > > 2) Zooming with a trackpad was not very smooth because IGV did panning and zooming at the same time - Now panning is disabled when CMD/Ctrl key is pressed for zooming > > 3) When only a few nodes were selected, zooming was no longer mouse centred. Instead, the center of the zooming was in the upper left corner. Now the zooming is centred to the middle of the scene when all selected nodes fit in the screen. > > 4) Added a shortcut (Ctrl - 0) to reset the zoom level to 100%. > > 5) Updated the Zoom icons to be vector graphics (.svg) > > # Implementation > > 1) New functions `getZoomMinFactor()` and `getZoomMinFactor()` assure that we do not zoom in or out our infinitely. `getZoomMinFactor()` assures that we do not zoom out further if zoom level is <100% and all visible nodes already fit on the screen. > > 2) We introduced a new `MouseCenteredZoomAction.java` for zooming with the mouse/trackpad. `MouseCenteredZoomAction` performs panning when the modifier key is pressed (Ctrl/CMD) and zooming otherwise. The functions `zoomIn ` and `zoomOut` now do animated zooming using `CustomZoomAnimator`. `CustomZoomAnimator` uses the mouse location as the centre of the zoom animation. > > 3) The `JScrollPane` now has a `JPanel centeringPanel` with `GridBagLayout()` that contains the `viewComponent`. This assures that the `viewComponent` is always centred when no scrollbars are visible. This makes the `Widget topLeft, bottomRight` obsolete as we can now add a white border of `BORDER_SIZE` to the `DiagramScene` instead. > > 4) `ZoomResetAction.java` resets the zoom level to 100%. The shortcut is `Ctrl - 0` and the action is available in the menu: `View` -> `Reset Zoom`. It was not added to the icon menu bar in the `EditorTopComponent` because of space issue. > > 5) new self created icons with vector graphics: `zoomIn.svg`, `zoomOut.svg` and `zoomReset.svg` Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: use animated zoom for selection of nodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10026/files - new: https://git.openjdk.org/jdk/pull/10026/files/cb4542d4..dce54bc5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10026&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10026&range=05-06 Stats: 7 lines in 1 file changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10026.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10026/head:pull/10026 PR: https://git.openjdk.org/jdk/pull/10026 From jiefu at openjdk.org Fri Sep 2 13:27:46 2022 From: jiefu at openjdk.org (Jie Fu) Date: Fri, 2 Sep 2022 13:27:46 GMT Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 07:43:00 GMT, Dean Long wrote: > You can find replay_pid37690.log in the attachment for the bug. I started with that and reduced it using JDK-8293287. I will attach that replay file to the bug as well. May I ask how can I reproduce this bug with replay_pid37690.log? I tried this on Linux/x86_64 ------------- PR: https://git.openjdk.org/jdk/pull/10136 From jiefu at openjdk.org Fri Sep 2 13:33:44 2022 From: jiefu at openjdk.org (Jie Fu) Date: Fri, 2 Sep 2022 13:33:44 GMT Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 01:27:57 GMT, Dean Long wrote: > The problem is caused by missing bailout logic in inline_string_char_access(). This PR adds the needed logic to match other intrinsics. > > I tried to come up with a stand-alone test case, but was not successful. I can reproduce it with `-XX:+ReplayIgnoreInitErrors`. Did you also use that flag? ------------- PR: https://git.openjdk.org/jdk/pull/10136 From roland at openjdk.org Fri Sep 2 13:37:53 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 2 Sep 2022 13:37:53 GMT Subject: Integrated: 8290529: C2: assert(BoolTest(btest).is_canonical()) failure In-Reply-To: References: Message-ID: On Tue, 19 Jul 2022 12:28:10 GMT, Roland Westrelin wrote: > For the test case: > > 1) In Parse::do_if(), tst0 is: > > (Bool#lt (CmpU 0 Parm0)) > > 2) transformed by gvn in tst: > > (Bool#gt (CmpU Parm0 0)) > > 3) That test is not canonical and is negated and retransformed which > results in: > > (Bool#eq (CmpI Parm0 0)) > > The assert fires because that test is not canonical either. > > The root cause I think is that the (CmpU .. 0) -> (CmpI .. 0) only > triggers if the condition of the CmpU is canonical (and results in a > non canonical test). Tweaking it so it applies even if the condition > is not leads to the following change in the steps above: > > 2) (Bool#ne (CmpI Parm0 0)) > > which is a canonical test. This pull request has now been integrated. Changeset: 77e21c57 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/77e21c57ce00463db4cc3d87f93729cbfe2c96b4 Stats: 113 lines in 4 files changed: 110 ins; 0 del; 3 mod 8290529: C2: assert(BoolTest(btest).is_canonical()) failure Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/9553 From jiefu at openjdk.org Fri Sep 2 13:39:50 2022 From: jiefu at openjdk.org (Jie Fu) Date: Fri, 2 Sep 2022 13:39:50 GMT Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 01:27:57 GMT, Dean Long wrote: > The problem is caused by missing bailout logic in inline_string_char_access(). This PR adds the needed logic to match other intrinsics. > > I tried to come up with a stand-alone test case, but was not successful. LGTM ------------- Marked as reviewed by jiefu (Reviewer). PR: https://git.openjdk.org/jdk/pull/10136 From thartmann at openjdk.org Fri Sep 2 13:50:48 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 2 Sep 2022 13:50:48 GMT Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 13:30:15 GMT, Jie Fu wrote: > I can reproduce it with -XX:+ReplayIgnoreInitErrors. I would recommend to always use that flag when trying replay compilation. There's almost always an issue with some (often unrelated) class dependencies not being resolvable. ------------- PR: https://git.openjdk.org/jdk/pull/10136 From jiefu at openjdk.org Fri Sep 2 13:50:49 2022 From: jiefu at openjdk.org (Jie Fu) Date: Fri, 2 Sep 2022 13:50:49 GMT Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 13:44:24 GMT, Tobias Hartmann wrote: > I would recommend to always use that flag when trying replay compilation. There's almost always an issue with some (often unrelated) class dependencies not being resolvable. Okay, got it. Thanks @TobiHartmann . ------------- PR: https://git.openjdk.org/jdk/pull/10136 From jiefu at openjdk.org Fri Sep 2 14:47:19 2022 From: jiefu at openjdk.org (Jie Fu) Date: Fri, 2 Sep 2022 14:47:19 GMT Subject: RFR: 8293319: [C2 cleanup] Remove unused other_path arg in Parse::adjust_map_after_if Message-ID: <5BklUdYlfZoVLtF9rUyz7KVPeg8OLUypgcwREykkr7I=.3b093ff9-dd96-417c-b4a8-a19468763627@github.com> Hi all, The `other_path` arg in `Parse::adjust_map_after_if` is unused. To simplify the use of `Parse::adjust_map_after_if`, it would be better to remove `other_path`. Thanks. Best regards, Jie ------------- Commit messages: - 8293319: [C2 cleanup] Remove unused other_path arg in Parse::adjust_map_after_if Changes: https://git.openjdk.org/jdk/pull/10146/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10146&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293319 Stats: 11 lines in 2 files changed: 0 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/10146.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10146/head:pull/10146 PR: https://git.openjdk.org/jdk/pull/10146 From kvn at openjdk.org Fri Sep 2 16:04:40 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 Sep 2022 16:04:40 GMT Subject: RFR: 8289943: Simplify some object allocation merges [v6] In-Reply-To: <5FwD40cG1de1mCNghB1LJNO3C0DVjo178qR5KiCjxaM=.13123849-3a8e-419f-b10e-406aa9fe2b40@github.com> References: <5FwD40cG1de1mCNghB1LJNO3C0DVjo178qR5KiCjxaM=.13123849-3a8e-419f-b10e-406aa9fe2b40@github.com> Message-ID: On Fri, 2 Sep 2022 00:34:49 GMT, Cesar Soares wrote: >> Allocations in `testPollutedPolymorphic()` are removed because both classes have the same `Shape` class which have all fields. Would be interesting if `l` field is declared only in both subclasses. > > @vnkozlov - Thank you for clarifying that. I've been playing with lifting the restriction and I actually found a corner case: > > > public static Class test(boolean c1, boolean c2, boolean c3, int x, int y, int w, int z) { > Animal s = new Dog(x, y, z); > > if (c1) { > s = new Cat("Fisker"); > } > > Unloaded u = new Unloaded(); // assumes this is converted to a uncommon_trap(unloaded, reinterpret) > > return s.getClass(); > } > > > It seems that when merging allocations of different subtypes I'll need to add a special `Phi` node merging the `Klass` of the input allocations and assign the output of that `Phi` to `SafepointScalarReplacedNode`. If I don't do that, the method above will return Animal.class instead of `Dog.class` or `Cat.class`. I'm wondering if I'll actually have to do the same for the Header/Mark word of the input allocations. @JohnTortugo Yes, you would need to construct Phi when you replace RAM with `SafePointScalarObjectNode`. Hmm, may be you would need to construct Phi in other cases too (getClass intrinsic). Add cases when class is loaded from argument for allocation: `Unsafe.allocateInstance()` and `Object.clone()` to test class Load instead of simple constant on some paths of such Phi. Why you need Phi for mark word? For identity check (hashCode/identityHashCode intrinsics)? ------------- PR: https://git.openjdk.org/jdk/pull/9073 From kvn at openjdk.org Fri Sep 2 16:11:56 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 Sep 2022 16:11:56 GMT Subject: RFR: 8293287 add ReplayReduce flag In-Reply-To: References: Message-ID: <79t3IdX7_aa4bpwrD9kHGiijB4x77reAW-rhW7jZX_0=.a9dfccb0-8dbd-4d7b-b66f-8a727746338e@github.com> On Fri, 2 Sep 2022 01:20:26 GMT, Dean Long wrote: > Add an experimental flag to help developers "reduce" a replay file. > > As a first step, I plan to simulate reduced inlining. This will output multiple "compile" lines as if the first level of inlining never happened: > A --> B --> C > A --> D --> E > becomes > B --> C > D --> E > Developers can repeat iteratively until the replay crash no longer reproduces. src/hotspot/share/ci/ciEnv.cpp line 1651: > 1649: if (task) { > 1650: #ifdef COMPILER2 > 1651: if (ReplayReduce && compiler_data() != NULL) { Is this feature when you are replaying compilation? If yes, add `ReplayCompiles` check too. src/hotspot/share/opto/compile.cpp line 4576: > 4574: } > 4575: > 4576: void Compile::dump_inline_data_reduced(outputStream* out) { Add `assert(ReplayReduce`. ------------- PR: https://git.openjdk.org/jdk/pull/10134 From kvn at openjdk.org Fri Sep 2 16:12:47 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 Sep 2022 16:12:47 GMT Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 01:27:57 GMT, Dean Long wrote: > The problem is caused by missing bailout logic in inline_string_char_access(). This PR adds the needed logic to match other intrinsics. > > I tried to come up with a stand-alone test case, but was not successful. Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10136 From kvn at openjdk.org Fri Sep 2 16:34:58 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 Sep 2022 16:34:58 GMT Subject: RFR: 8293319: [C2 cleanup] Remove unused other_path arg in Parse::adjust_map_after_if In-Reply-To: <5BklUdYlfZoVLtF9rUyz7KVPeg8OLUypgcwREykkr7I=.3b093ff9-dd96-417c-b4a8-a19468763627@github.com> References: <5BklUdYlfZoVLtF9rUyz7KVPeg8OLUypgcwREykkr7I=.3b093ff9-dd96-417c-b4a8-a19468763627@github.com> Message-ID: On Fri, 2 Sep 2022 14:35:28 GMT, Jie Fu wrote: > Hi all, > > The `other_path` arg in `Parse::adjust_map_after_if` is unused. > To simplify the use of `Parse::adjust_map_after_if`, it would be better to remove `other_path`. > > Thanks. > Best regards, > Jie Trivial. I looked on history and it never was used since this code was added. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10146 From xliu at openjdk.org Fri Sep 2 17:03:42 2022 From: xliu at openjdk.org (Xin Liu) Date: Fri, 2 Sep 2022 17:03:42 GMT Subject: RFR: 8292301: [REDO v2] C2 crash when allocating array of size too large [v2] In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 09:02:54 GMT, Roland Westrelin wrote: >> On top of the redo, this fixed 2 bugs: >> >> 8288184: the problem here is that the ValidLengthTest input of an >> AllocateArrayNode becomes a constant. The CatchNode would then change >> types if it was reprocessed but it's not. Custom logic is needed to >> enqueue the CatchNode when the ValidLengthTest input of an >> AllocateArrayNode changes. The CastII out of the AllocateArrayNode >> becomes top but the fallthrough path doesn't die. This happens with >> igvn in the case of the bug but could also happen with ccp. I fixed >> both in this patch. >> >> 8291665: the code pattern for this is 2 AllocateArrayNodes out of loop >> with a shared ValidLengthTest input in a loop. When the loop is cloned >> that causes Phis to be added between the AllocateArrayNodes and the >> BoolNode of the ValidLengthTest inputs. Split if runs next and it >> doesn't expect the Phi at the ValidLengthTest inputs. The fix here is >> to clone the Bool/Cmp subgraph down on loop cloning. There's logic for >> that when the use of the bool is an If for instance so I simply added >> a special case to run that logic for an AllocateArrayNode use as >> well. Note that the test case I added fails reliably on 11 but not >> with the current jdk developement branch. AFAICT, the bug is there but >> something unrelated changed and a slightly different graph is built >> for the test case that prevents split if. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > undo needless change LGTM. ------------- Marked as reviewed by xliu (Committer). PR: https://git.openjdk.org/jdk/pull/10038 From dlong at openjdk.org Fri Sep 2 19:26:38 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 2 Sep 2022 19:26:38 GMT Subject: RFR: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 13:46:26 GMT, Jie Fu wrote: >>> I can reproduce it with -XX:+ReplayIgnoreInitErrors. >> >> I would recommend to always use that flag when trying replay compilation. There's almost always an issue with some (often unrelated) class dependencies not being resolvable. > >> I would recommend to always use that flag when trying replay compilation. There's almost always an issue with some (often unrelated) class dependencies not being resolvable. > > Okay, got it. > Thanks @TobiHartmann . Thanks Vladimir. Thanks @DamonFool. ------------- PR: https://git.openjdk.org/jdk/pull/10136 From dlong at openjdk.org Fri Sep 2 19:28:01 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 2 Sep 2022 19:28:01 GMT Subject: Integrated: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out In-Reply-To: References: Message-ID: <2Xmc2SPDMeDGHoxDzbNVW-qSLjQkEWIHFdWy_Wk-rO0=.573aa6c4-93e3-4622-9423-4ff568e4d137@github.com> On Fri, 2 Sep 2022 01:27:57 GMT, Dean Long wrote: > The problem is caused by missing bailout logic in inline_string_char_access(). This PR adds the needed logic to match other intrinsics. > > I tried to come up with a stand-alone test case, but was not successful. This pull request has now been integrated. Changeset: 5757e212 Author: Dean Long URL: https://git.openjdk.org/jdk/commit/5757e2129ef23f6aa11a9a29d77ae86971b401c0 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out Reviewed-by: thartmann, jiefu, kvn ------------- PR: https://git.openjdk.org/jdk/pull/10136 From duke at openjdk.org Fri Sep 2 20:22:49 2022 From: duke at openjdk.org (Cesar Soares) Date: Fri, 2 Sep 2022 20:22:49 GMT Subject: RFR: 8289943: Simplify some object allocation merges [v6] In-Reply-To: <5FwD40cG1de1mCNghB1LJNO3C0DVjo178qR5KiCjxaM=.13123849-3a8e-419f-b10e-406aa9fe2b40@github.com> References: <5FwD40cG1de1mCNghB1LJNO3C0DVjo178qR5KiCjxaM=.13123849-3a8e-419f-b10e-406aa9fe2b40@github.com> Message-ID: On Fri, 2 Sep 2022 00:34:49 GMT, Cesar Soares wrote: >> Allocations in `testPollutedPolymorphic()` are removed because both classes have the same `Shape` class which have all fields. Would be interesting if `l` field is declared only in both subclasses. > > @vnkozlov - Thank you for clarifying that. I've been playing with lifting the restriction and I actually found a corner case: > > > public static Class test(boolean c1, boolean c2, boolean c3, int x, int y, int w, int z) { > Animal s = new Dog(x, y, z); > > if (c1) { > s = new Cat("Fisker"); > } > > Unloaded u = new Unloaded(); // assumes this is converted to a uncommon_trap(unloaded, reinterpret) > > return s.getClass(); > } > > > It seems that when merging allocations of different subtypes I'll need to add a special `Phi` node merging the `Klass` of the input allocations and assign the output of that `Phi` to `SafepointScalarReplacedNode`. If I don't do that, the method above will return Animal.class instead of `Dog.class` or `Cat.class`. I'm wondering if I'll actually have to do the same for the Header/Mark word of the input allocations. > @JohnTortugo Yes, you would need to construct Phi when you replace RAM with SafePointScalarObjectNode. Hmm, may be you would need to construct Phi in other cases too (getClass intrinsic). Yes. I'll take a look into `getClass` intrinsic. I thought that just adding input to `SafePointScalarObjectNode`+Safepoint with a Phi of the input allocations Klass fields would be enough for the code to correctly access fields/methods of the input objects. However, it looks like the Klass of the object rematerialized from SafePointScalarObjectNode is [the type of `SafePointScalarObjectNode`](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L859), not the Klass field input edge set in the Safepoint. > Add cases when class is loaded from argument for allocation: Unsafe.allocateInstance() and Object.clone() to test class Load instead of simple constant on some paths of such Phi. I'll do that. Thanks for the tip. > Why you need Phi for mark word? For identity check (hashCode/identityHashCode intrinsics)? Yes, I was wondering about a case where I'd need any of the information present in the Mark word. The code in the PR is able to solve merges where only some of the merged allocations are removed. I believe I'll need, at least, to preserve the Mark word of the allocations not removed. ------------- PR: https://git.openjdk.org/jdk/pull/9073 From dlong at openjdk.org Fri Sep 2 20:39:46 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 2 Sep 2022 20:39:46 GMT Subject: RFR: 8293287 add ReplayReduce flag In-Reply-To: <79t3IdX7_aa4bpwrD9kHGiijB4x77reAW-rhW7jZX_0=.a9dfccb0-8dbd-4d7b-b66f-8a727746338e@github.com> References: <79t3IdX7_aa4bpwrD9kHGiijB4x77reAW-rhW7jZX_0=.a9dfccb0-8dbd-4d7b-b66f-8a727746338e@github.com> Message-ID: On Fri, 2 Sep 2022 16:07:47 GMT, Vladimir Kozlov wrote: >> Add an experimental flag to help developers "reduce" a replay file. >> >> As a first step, I plan to simulate reduced inlining. This will output multiple "compile" lines as if the first level of inlining never happened: >> A --> B --> C >> A --> D --> E >> becomes >> B --> C >> D --> E >> Developers can repeat iteratively until the replay crash no longer reproduces. > > src/hotspot/share/ci/ciEnv.cpp line 1651: > >> 1649: if (task) { >> 1650: #ifdef COMPILER2 >> 1651: if (ReplayReduce && compiler_data() != NULL) { > > Is this feature when you are replaying compilation? If yes, add `ReplayCompiles` check too. It seems most useful with ReplayCompiles, but I don't want to require it. A developer might want to turn on ReplayReduce without ReplayCompiles, to save a replay step, for example when running generated tests from creduce or javafuzzer without an existing replay file. ------------- PR: https://git.openjdk.org/jdk/pull/10134 From dlong at openjdk.org Fri Sep 2 20:41:58 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 2 Sep 2022 20:41:58 GMT Subject: RFR: 8293287 add ReplayReduce flag [v2] In-Reply-To: References: Message-ID: > Add an experimental flag to help developers "reduce" a replay file. > > As a first step, I plan to simulate reduced inlining. This will output multiple "compile" lines as if the first level of inlining never happened: > A --> B --> C > A --> D --> E > becomes > B --> C > D --> E > Developers can repeat iteratively until the replay crash no longer reproduces. Dean Long has updated the pull request incrementally with two additional commits since the last revision: - add assert - fix for TestVMNoCompLevel.java test failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10134/files - new: https://git.openjdk.org/jdk/pull/10134/files/b5959eb6..73d6e812 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10134&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10134&range=00-01 Stats: 3 lines in 2 files changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10134.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10134/head:pull/10134 PR: https://git.openjdk.org/jdk/pull/10134 From kvn at openjdk.org Fri Sep 2 22:23:40 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 Sep 2022 22:23:40 GMT Subject: RFR: 8293287 add ReplayReduce flag [v2] In-Reply-To: References: Message-ID: <0THbzp1y6RrdE9KH_8xbzLx2YNmYJidQ0u8P7NuyIzg=.0e6cf4cd-97fb-45b3-917a-40ee3c151990@github.com> On Fri, 2 Sep 2022 20:41:58 GMT, Dean Long wrote: >> Add an experimental flag to help developers "reduce" a replay file. >> >> As a first step, I plan to simulate reduced inlining. This will output multiple "compile" lines as if the first level of inlining never happened: >> A --> B --> C >> A --> D --> E >> becomes >> B --> C >> D --> E >> Developers can repeat iteratively until the replay crash no longer reproduces. > > Dean Long has updated the pull request incrementally with two additional commits since the last revision: > > - add assert > - fix for TestVMNoCompLevel.java test failure Okay. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10134 From jiefu at openjdk.org Fri Sep 2 22:44:07 2022 From: jiefu at openjdk.org (Jie Fu) Date: Fri, 2 Sep 2022 22:44:07 GMT Subject: RFR: 8293319: [C2 cleanup] Remove unused other_path arg in Parse::adjust_map_after_if In-Reply-To: References: <5BklUdYlfZoVLtF9rUyz7KVPeg8OLUypgcwREykkr7I=.3b093ff9-dd96-417c-b4a8-a19468763627@github.com> Message-ID: On Fri, 2 Sep 2022 16:31:57 GMT, Vladimir Kozlov wrote: >> Hi all, >> >> The `other_path` arg in `Parse::adjust_map_after_if` is unused. >> To simplify the use of `Parse::adjust_map_after_if`, it would be better to remove `other_path`. >> >> Thanks. >> Best regards, >> Jie > > Trivial. > I looked on history and it never was used since this code was added. Thanks @vnkozlov . ------------- PR: https://git.openjdk.org/jdk/pull/10146 From jiefu at openjdk.org Fri Sep 2 22:44:07 2022 From: jiefu at openjdk.org (Jie Fu) Date: Fri, 2 Sep 2022 22:44:07 GMT Subject: Integrated: 8293319: [C2 cleanup] Remove unused other_path arg in Parse::adjust_map_after_if In-Reply-To: <5BklUdYlfZoVLtF9rUyz7KVPeg8OLUypgcwREykkr7I=.3b093ff9-dd96-417c-b4a8-a19468763627@github.com> References: <5BklUdYlfZoVLtF9rUyz7KVPeg8OLUypgcwREykkr7I=.3b093ff9-dd96-417c-b4a8-a19468763627@github.com> Message-ID: <_Bf87d5wIv20qBw0PdUEvctQks6E-BIkZX5L8TVs_MI=.28c35235-a77f-46aa-a86e-d55b246edf57@github.com> On Fri, 2 Sep 2022 14:35:28 GMT, Jie Fu wrote: > Hi all, > > The `other_path` arg in `Parse::adjust_map_after_if` is unused. > To simplify the use of `Parse::adjust_map_after_if`, it would be better to remove `other_path`. > > Thanks. > Best regards, > Jie This pull request has now been integrated. Changeset: e1e67324 Author: Jie Fu URL: https://git.openjdk.org/jdk/commit/e1e67324c0c3d8b26af8ae5382073d8f477dbe3c Stats: 11 lines in 2 files changed: 0 ins; 4 del; 7 mod 8293319: [C2 cleanup] Remove unused other_path arg in Parse::adjust_map_after_if Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/10146 From dlong at openjdk.org Fri Sep 2 23:38:39 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 2 Sep 2022 23:38:39 GMT Subject: RFR: 8293287 add ReplayReduce flag [v2] In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 20:41:58 GMT, Dean Long wrote: >> Add an experimental flag to help developers "reduce" a replay file. >> >> As a first step, I plan to simulate reduced inlining. This will output multiple "compile" lines as if the first level of inlining never happened: >> A --> B --> C >> A --> D --> E >> becomes >> B --> C >> D --> E >> Developers can repeat iteratively until the replay crash no longer reproduces. > > Dean Long has updated the pull request incrementally with two additional commits since the last revision: > > - add assert > - fix for TestVMNoCompLevel.java test failure Unfortunately, the TestVMNoCompLevel.java test is still failing on some platforms. I'm tempted to remove that test, as it doesn't not seem to add value anymore. ------------- PR: https://git.openjdk.org/jdk/pull/10134 From kvn at openjdk.org Sat Sep 3 00:55:40 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 3 Sep 2022 00:55:40 GMT Subject: RFR: 8293287 add ReplayReduce flag [v2] In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 23:36:38 GMT, Dean Long wrote: > Unfortunately, the TestVMNoCompLevel.java test is still failing on some platforms. I'm tempted to remove that test, as it doesn't not seem to add value anymore. Can you explain the issue with this test and your changes? New code should be off by default. ------------- PR: https://git.openjdk.org/jdk/pull/10134 From dlong at openjdk.org Sat Sep 3 02:58:38 2022 From: dlong at openjdk.org (Dean Long) Date: Sat, 3 Sep 2022 02:58:38 GMT Subject: RFR: 8293287 add ReplayReduce flag [v2] In-Reply-To: References: Message-ID: On Sat, 3 Sep 2022 00:51:45 GMT, Vladimir Kozlov wrote: > Can you explain the issue with this test and your changes? New code should be off by default. I had to move the call to reset() (which clears parsing errors) to allow multiple compile commands. This revealed an existing problem with compiler/ciReplay/TestVMNoCompLevel.java. This test removes the last token from the "compile" line, trying to simulate a pre-2013 replay file. Removing the last token is wrong if there are inlining tokens. I could fix the test, but I don't see the value. I don't think a 2022 JVM needs to read pre-2013 replay files. If necessary, an old replay file can be edited to allow it to be parsed by a recent JVM. I proposed to deleted the test and related workarounds in the code. ------------- PR: https://git.openjdk.org/jdk/pull/10134 From kvn at openjdk.org Sat Sep 3 04:54:28 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 3 Sep 2022 04:54:28 GMT Subject: RFR: 8293287 add ReplayReduce flag [v2] In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 20:41:58 GMT, Dean Long wrote: >> Add an experimental flag to help developers "reduce" a replay file. >> >> As a first step, I plan to simulate reduced inlining. This will output multiple "compile" lines as if the first level of inlining never happened: >> A --> B --> C >> A --> D --> E >> becomes >> B --> C >> D --> E >> Developers can repeat iteratively until the replay crash no longer reproduces. > > Dean Long has updated the pull request incrementally with two additional commits since the last revision: > > - add assert > - fix for TestVMNoCompLevel.java test failure Okay ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10134 From dlong at openjdk.org Sat Sep 3 21:56:50 2022 From: dlong at openjdk.org (Dean Long) Date: Sat, 3 Sep 2022 21:56:50 GMT Subject: RFR: 8293287 add ReplayReduce flag [v3] In-Reply-To: References: Message-ID: > Add an experimental flag to help developers "reduce" a replay file. > > As a first step, I plan to simulate reduced inlining. This will output multiple "compile" lines as if the first level of inlining never happened: > A --> B --> C > A --> D --> E > becomes > B --> C > D --> E > Developers can repeat iteratively until the replay crash no longer reproduces. Dean Long has updated the pull request incrementally with one additional commit since the last revision: remove support for pre-2013 replay files ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10134/files - new: https://git.openjdk.org/jdk/pull/10134/files/73d6e812..4ba53b45 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10134&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10134&range=01-02 Stats: 88 lines in 2 files changed: 0 ins; 87 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10134.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10134/head:pull/10134 PR: https://git.openjdk.org/jdk/pull/10134 From haosun at openjdk.org Mon Sep 5 00:50:34 2022 From: haosun at openjdk.org (Hao Sun) Date: Mon, 5 Sep 2022 00:50:34 GMT Subject: RFR: 8292587: AArch64: Support SVE fabd instruction In-Reply-To: References: Message-ID: <2FtnPZNGEwNeHIdZpikZaqae9BbFn6LCsjlPg-8Q3xw=.2595dae0-0467-4a3f-ba75-31edfdf2406d@github.com> On Thu, 25 Aug 2022 01:52:41 GMT, Hao Sun wrote: > Scalar and NEON fabd instructions were initially supported in > JDK-8256318. In this patch, we support SVE fabd instruction [1] and add > one Jtreg test case as well. > > With this patch, two instructions `fsub + fabs` would be combined into > one single `fabd` instruction. > > > fsub z16.s, z16.s, z17.s > fabs z16.s, p7/m, z16.s > > --> > > fabd z16.s, p7/m, z16.s, z17.s > > > In the initial evaluation of JMH case, i.e. > FloatingScalarVectorAbsDiff.java, we found the performance uplift done > by this optimization was easily hidden by the heavy memory load/store > instructions. To avoid that, we updated the JMH case a bit, adding one > more group of subtraction and Math.abs operations in the loop body. > > Here shows the data with the new JMH case on one 256-bit SVE machine. We > can observe about 39% and 35% improvements for the two functions > respectively. > > > Benchmark Before After Units > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 260.468 160.965 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 133.963 87.292 ns/op > > > Jtreg testing: tier1~3 passed on one NEON-only machine and one 256-bit SVE machine. > > [1] https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FABD--Floating-point-absolute-difference--predicated-- Ping? Can anyone help to review this patch? Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10011 From eliu at openjdk.org Mon Sep 5 01:53:42 2022 From: eliu at openjdk.org (Eric Liu) Date: Mon, 5 Sep 2022 01:53:42 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v6] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> Message-ID: On Wed, 31 Aug 2022 06:10:07 GMT, Xiaohong Gong wrote: >> Recently we found the performance of "`FIRST_NONZERO`" for double type is largely worse than the other types on x86 when `UseAVX=2`. The main reason is the "`VectorCastL2X`" op is not supported by the backend when the dst element type is `T_DOUBLE`. This makes the check of `VectorCast` op fail before intrinsifying "`VectorMask.cast()`" which is used in the >> "`FIRST_NONZERO`" java implementation (see [1]). However, the compiler will not generate the `VectorCast `op for `VectorMask.cast()` if: >> >> 1) the current platform supports the predicated feature >> 2) the element size (in bytes) of the src and dst type is the same >> >> So the check of "`VectorCast`" op is needless for such cases. To fix it, this patch: >> >> 1) limits the specified vector cast op check to vectors >> 2) adds the relative mask cast op check for VectorMask.cast() >> 3) cleans up the unnecessary codes >> >> Here is the performance of "`FIRST_NONZERO`" benchmark [2] on a x86 machine with `UseAVX=2`: >> >> Benchmark (size) Mode Cnt Before After Units >> DoubleMaxVector.FIRST_NONZERO 1024 thrpt 15 49.266 2460.886 ops/ms >> DoubleMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 49.554 1892.223 ops/ms >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java#L770 >> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L246 > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments LGTM. ------------- Marked as reviewed by eliu (Author). PR: https://git.openjdk.org/jdk/pull/9737 From njian at openjdk.org Mon Sep 5 04:00:42 2022 From: njian at openjdk.org (Ningsheng Jian) Date: Mon, 5 Sep 2022 04:00:42 GMT Subject: RFR: 8292587: AArch64: Support SVE fabd instruction In-Reply-To: References: Message-ID: On Thu, 25 Aug 2022 01:52:41 GMT, Hao Sun wrote: > Scalar and NEON fabd instructions were initially supported in > JDK-8256318. In this patch, we support SVE fabd instruction [1] and add > one Jtreg test case as well. > > With this patch, two instructions `fsub + fabs` would be combined into > one single `fabd` instruction. > > > fsub z16.s, z16.s, z17.s > fabs z16.s, p7/m, z16.s > > --> > > fabd z16.s, p7/m, z16.s, z17.s > > > In the initial evaluation of JMH case, i.e. > FloatingScalarVectorAbsDiff.java, we found the performance uplift done > by this optimization was easily hidden by the heavy memory load/store > instructions. To avoid that, we updated the JMH case a bit, adding one > more group of subtraction and Math.abs operations in the loop body. > > Here shows the data with the new JMH case on one 256-bit SVE machine. We > can observe about 39% and 35% improvements for the two functions > respectively. > > > Benchmark Before After Units > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 260.468 160.965 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 133.963 87.292 ns/op > > > Jtreg testing: tier1~3 passed on one NEON-only machine and one 256-bit SVE machine. > > [1] https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FABD--Floating-point-absolute-difference--predicated-- Looks good. ------------- Marked as reviewed by njian (Committer). PR: https://git.openjdk.org/jdk/pull/10011 From rcastanedalo at openjdk.org Mon Sep 5 07:20:58 2022 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 5 Sep 2022 07:20:58 GMT Subject: Integrated: 8292660: C2: blocks made unreachable by NeverBranch-to-Goto conversion are removed incorrectly In-Reply-To: <1rtbcl8YVPhYz8cJ3RmnhUrv4Tl8tFfQ-R1ZdkbhiaA=.aa9cf2fb-1e13-4d0e-bb9f-c48d9f5ed6db@github.com> References: <1rtbcl8YVPhYz8cJ3RmnhUrv4Tl8tFfQ-R1ZdkbhiaA=.aa9cf2fb-1e13-4d0e-bb9f-c48d9f5ed6db@github.com> Message-ID: On Tue, 23 Aug 2022 08:26:57 GMT, Roberto Casta?eda Lozano wrote: > This changeset addresses three issues in the current removal of unreachable blocks after NeverBranch-to-goto conversion (introduced recently by [JDK-8292285](https://bugs.openjdk.org/browse/JDK-8292285)): > > 1. The [unreachable block removal and pre-order index update loop](https://github.com/openjdk/jdk/blob/7b5f9edb59ef763acca80724ca37f3624d720d06/src/hotspot/share/opto/block.cpp#L613-L621) skips the block next to the removed one, and iterates beyond the end of the block list (`PhaseCFG::_blocks`). Skipping blocks can lead to duplicate pre-order indices (`Block::_pre_order`) and/or pre-order indices greater than the size of the block list, causing problems in later transformations. > > 2. The [outer block traversal loop](https://github.com/openjdk/jdk/blob/7b5f9edb59ef763acca80724ca37f3624d720d06/src/hotspot/share/opto/block.cpp#L698-L729) iterates beyond the end of the block list whenever one or more unreachable blocks are removed. > > 3. Transitively unreachable blocks (such as B10 in the following example), arising in methods with multiple infinite loops, are not removed: > > ![transitive](https://user-images.githubusercontent.com/8792647/186109043-416213b7-8735-41de-9910-acf0997db095.png) > > This changeset addresses issues 2 and 3 by decoupling NeverBranch-to-goto conversion from removal of unreachable code. Instead of removing the blocks eagerly, the removal is postponed to a later phase that works in an iterative worklist fashion, making it possible to remove transitively unreachable blocks such as B10 in the above example. Postponing removal to a later phase (where `get_block(i)->_pre_order == i` holds) also simplifies addressing issue 1: in the changeset, it is sufficient to iterate over the blocks that follow the removed block in the block list to decrement their `_pre_order` index. > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode). > - tier4-7 (linux-x64; debug mode). > - fuzzing (~1 h. on each platform). This pull request has now been integrated. Changeset: 730ced9a Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/730ced9a109953ca1c3b7bfd6a3eeac5b85892c5 Stats: 123 lines in 4 files changed: 108 ins; 13 del; 2 mod 8292660: C2: blocks made unreachable by NeverBranch-to-Goto conversion are removed incorrectly Reviewed-by: kvn, roland ------------- PR: https://git.openjdk.org/jdk/pull/9976 From bkilambi at openjdk.org Mon Sep 5 10:31:19 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 5 Sep 2022 10:31:19 GMT Subject: RFR: 8292675: Add identity transformation for removing redundant AndV/OrV nodes Message-ID: Recently we found that the rotate left/right benchmarks with vectorapi emit a redundant "and" instruction on both aarch64 and x86_64 machines which can be done away with. For example - and(and(a, b), b) generates two "and" instructions which can be reduced to a single "and" operation- and(a, b) since "and" (and "or") operations are commutative and idempotent in nature. This can help improve performance for all those workloads which have multiple "and"/"or" operations with the same value by reducing them to fewer "and"/"or" operations accordingly. This patch adds the following transformations for vector logical operations - AndV and OrV : (OpV (OpV a b) b) => (OpV a b) (OpV (OpV a b) a) => (OpV a b) (OpV (OpV a b m1) b m1) => (OpV a b m1) (OpV (OpV a b m1) a m1) => (OpV a b m1) (OpV a (OpV a b)) => (OpV a b) (OpV b (OpV a b)) => (OpV a b) (OpV a (OpV a b m) m) => (OpV a b m) where Op = "And", "Or" Links for benchmarks tested are given below :- https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L728 https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L764 https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L728 https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L764 Before this patch, the disassembly for one these testcases (IntMaxVector.ROR) for Neon is shown below : ``` ldr q16, [x12, #16] and v16.16b, v16.16b, v20.16b and v16.16b, v16.16b, v20.16b add x12, x16, x11 sub v17.4s, v21.4s, v16.4s ... ... After this patch, the disassembly for the same testcase above is shown below : ldr q16, [x12, #16] and v16.16b, v16.16b, v20.16b add x12, x16, x11 sub v17.4s, v21.4s, v16.4s ... ... The other tests also emit an extra "and" instruction as shown above for the vector ROR/ROL operations. Below are the performance results for the vectorapi rotate tests (tests given in the links above) with this patch on aarch64 and x86_64 machines (for int and long types) - Benchmark aarch64 x86_64 IntMaxVector.ROL 25.57% 26.09% IntMaxVector.ROR 23.75% 24.15% LongMaxVector.ROL 28.91% 28.51% LongMaxVector.ROR 16.51% 29.11% The percentage indicates the percent gain/improvement in performance (ops/ms) with this patch over the master build without this patch. The machine descriptions are given below - aarch64 - 128-bit aarch64 machine x86_64 - 256-bit x86 machine ------------- Commit messages: - 8292675: Add identity transformation for removing redundant AndV/OrV nodes Changes: https://git.openjdk.org/jdk/pull/10163/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10163&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8292675 Stats: 287 lines in 2 files changed: 285 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10163.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10163/head:pull/10163 PR: https://git.openjdk.org/jdk/pull/10163 From pli at openjdk.org Mon Sep 5 13:15:41 2022 From: pli at openjdk.org (Pengfei Li) Date: Mon, 5 Sep 2022 13:15:41 GMT Subject: RFR: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv [v3] In-Reply-To: References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> Message-ID: On Fri, 2 Sep 2022 06:11:58 GMT, Pengfei Li wrote: >> This is a REDO of JDK-8289996. In previous patch, we defer some strength >> reductions in Ideal functions of `Mul[I|L]Node` to post loop igvn phase >> to fix a range check hoisting issue. More about previous patch can be >> found in PR #9508, where we have described some details of the issue >> we would like to fix. >> >> Previous patch was backed out due to some jtreg failures found. We have >> analyzed those failures one by one and found one of them exposes a real >> performance regression. We see that deferring some strength reductions >> to post loop igvn phase has too much impact. Some vector multiplication >> will not be optimized to vector addition with vector shift after that >> change. So in this REDO we propose the range check hoisting fix with a >> different approach. >> >> In this new patch, we add some recursive pattern matches for scaled loop >> iv in function `PhaseIdealLoop::is_scaled_iv()`. These include matching >> a sum or a difference of two scaled iv expressions. With this, all kinds >> of Ideal-transformed scaled iv expressions can still be recognized. This >> new approach only touches loop transformation code and hence has much >> smaller impact. We have verified that this new approach applies to both >> int range checks and long range checks. >> >> Previously attached jtreg case fails on ppc64 because VectorAPI has no >> vector intrinsics on ppc64 so there's no long range check to hoist. In >> this patch, we limit the test architecture to x64 and AArch64. >> >> Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Update p_short_scale compuation May I have another review for this REDO? Perhaps @vnkozlov @TobiHartmann ------------- PR: https://git.openjdk.org/jdk/pull/9851 From epeter at openjdk.org Mon Sep 5 14:27:55 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 Sep 2022 14:27:55 GMT Subject: RFR: 8288897: Clean up node dump code [v5] In-Reply-To: References: Message-ID: On Fri, 12 Aug 2022 08:14:19 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8288897 >> - review suggestions from @navyxliu >> - implementing Christians review suggestions >> - Merge branch 'master' into JDK-8288897 >> - Apply suggestions from code review >> >> 2 style fixes by Christian >> >> Co-authored-by: Christian Hagedorn >> - cleanup, move debug functions to cpp to prevent inlining, add comment for debugger functions >> - make dump_bfs const, change datastructures, change some signatures to const >> - refactor dump to use dump_bfs, redefine categories through output types >> - 8288897: Clean up dump code for nodes > > I was out on vacation and only had the chance to have a look it again now. Thanks for doing the updates and verifying that the order is still the same! Thanks @navyxliu for double checking it as well! The colorful dump looks really nice :) > > Thanks, > Christian Thanks @chhagedorn @navyxliu for the review and comments! ------------- PR: https://git.openjdk.org/jdk/pull/9234 From tholenstein at openjdk.org Mon Sep 5 14:30:17 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 5 Sep 2022 14:30:17 GMT Subject: RFR: JDK-8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs Message-ID: Refactor the Actions in EditorTopComponent (com.sun.hotspot.igv.view.actions). Move Action specific code from EditorTopComponent to the corresponding Action. # Refactoring of com.sun.hotspot.igv.view.actions and EditorTopComponent - Created a new `ExportGraph` Action and moved corresponding functions `exportToSVG(..)` and `exportToPDF(..)` to new `ExportGraph.java` - Moved key bindings for satellite-view (pressing S) from `EditorTopComponent` to `OverviewAction.java` - Moved Action specific code from `EditorTopComponent` to the corresponding `XXXAction.java` # Fixing minor Bugs - "Show empty blocks in control-flow graph view" is selected by default but only enabled in CFG view. This is distracting for the eye when we are not in CFG: cfg_before Now "Show empty blocks in control-flow graph view" is not selected anymore when disabled (greyed out) cfg_node_disable But still gets selected by default when enabled cfg_now - "Extract current set of selected nodes", "Hide selected nodes" and "show all nodes" were always enables, even when they didn't effect anything. selection_before Now "Extract current set of selected nodes", "Hide selected nodes" are disabled (greyed out) when no nodes are selected. And "show all nodes" is disabled (greyed out) when all nodes are already visible. selection_now - "Reduce the difference selection" got stuck when at the last graphs in the group because it got greyed out. reduce_stuck duce the difference selection" Now "Reduce the difference selection" works as expected: reduce_now ------------- Commit messages: - Fix ReduceDiffAction stuck - Fix enabling of ReduceDiffAction - refactor OverviewAction - refactor HideAction, ShowAllAction and ExtractAction - Refactor LayoutActions - deselect ShowEmptyBlocksAction when disabled - ExportGraph.java - Fix satellite view - update SelectionModeAction - refactor Toolbar Actions - ... and 2 more: https://git.openjdk.org/jdk/compare/512fee1d...7bcf79df Changes: https://git.openjdk.org/jdk/pull/10170/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293364 Stats: 1137 lines in 28 files changed: 543 ins; 443 del; 151 mod Patch: https://git.openjdk.org/jdk/pull/10170.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10170/head:pull/10170 PR: https://git.openjdk.org/jdk/pull/10170 From epeter at openjdk.org Mon Sep 5 14:30:21 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 Sep 2022 14:30:21 GMT Subject: Integrated: 8288897: Clean up node dump code In-Reply-To: References: Message-ID: On Wed, 22 Jun 2022 11:38:15 GMT, Emanuel Peter wrote: > I recently did some work in the area of `Node::dump` and `Node::find`, see [JDK-8287647](https://bugs.openjdk.org/browse/JDK-8287647) and [JDK-8283775](https://bugs.openjdk.org/browse/JDK-8283775). > > This change sets cleans up the code around, and tries to reduce code duplication. > > Things I did: > - remove Node::related. It was added 7 years ago, with [JDK-8004073](https://bugs.openjdk.org/browse/JDK-8004073). However, it was not extended to many nodes, and hence it is incomplete, and nobody I know seems to use it. > - refactor `dump(int)` to use `dump_bfs` (reduce code duplication). > - redefine categories in `dump_bfs`, focusing on output types. Mixed type is now also control if it has control output, and memory if it has memory output, etc. Plus, a node is also in the control category if it `is_CFG`. This makes `dump_bfs` much more usable, to traverse control and memory flow. > - Other small cleanups, like replacing rarely used dump functions with dump, making removing dead code, make some functions private > - Adding `call from debugger` comment to VM functions that are useful in debugger > - rename `find_node_by_name` to `find_nodes_by_name` and `find_node_by_dump` to `find_nodes_by_dump`. > - remove now unused dump indent compiler flag `PrintIdealIndentThreshold` (notproduct) This pull request has now been integrated. Changeset: dbb2c4b6 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/dbb2c4b6ac01d2a3367a2354213d3b4230dfbb96 Stats: 683 lines in 18 files changed: 53 ins; 554 del; 76 mod 8288897: Clean up node dump code Reviewed-by: chagedorn, xliu ------------- PR: https://git.openjdk.org/jdk/pull/9234 From jbhateja at openjdk.org Mon Sep 5 19:27:01 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 Sep 2022 19:27:01 GMT Subject: RFR: 8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms [v2] In-Reply-To: References: Message-ID: > Hi All, > > This patch extends conversion optimizations added with [JDK-8287835](https://bugs.openjdk.org/browse/JDK-8287835) to optimize following floating point to integral conversions for X86 AVX2 targets:- > * D2I , D2S, D2B, F2I , F2S, F2B > > In addition, it also optimizes following wide vector (64 bytes) double to integer and sub-type conversions for AVX512 targets which do not support AVX512DQ feature. > * D2I, D2S, D2B > > Following are the JMH micro performance results with and without patch. > > System configuration: 40C 2S Icelake server (Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz) > > BENCHMARK | SIZE | BASELINE (ops/ms) | WITHOPT (ops/ms) | PERF GAIN FACTOR > -- | -- | -- | -- | -- > VectorFPtoIntCastOperations.microDouble128ToByte128 | 1024 | 90.603 | 92.797 | 1.024215534 > VectorFPtoIntCastOperations.microDouble128ToByte256 | 1024 | 81.909 | 82.3 | 1.00477359 > VectorFPtoIntCastOperations.microDouble128ToByte512 | 1024 | 26.181 | 26.244 | 1.002406325 > VectorFPtoIntCastOperations.microDouble128ToInteger128 | 1024 | 90.74 | 2537.958 | 27.96956138 > VectorFPtoIntCastOperations.microDouble128ToInteger256 | 1024 | 81.586 | 2429.599 | 29.7796068 > VectorFPtoIntCastOperations.microDouble128ToInteger512 | 1024 | 19.406 | 19.61 | 1.010512213 > VectorFPtoIntCastOperations.microDouble128ToLong128 | 1024 | 91.723 | 90.754 | 0.989435583 > VectorFPtoIntCastOperations.microDouble128ToShort128 | 1024 | 91.766 | 1984.577 | 21.62649565 > VectorFPtoIntCastOperations.microDouble128ToShort256 | 1024 | 81.949 | 1940.599 | 23.68056962 > VectorFPtoIntCastOperations.microDouble128ToShort512 | 1024 | 16.468 | 16.56 | 1.005586592 > VectorFPtoIntCastOperations.microDouble256ToByte128 | 1024 | 163.331 | 3018.351 | 18.479964 > VectorFPtoIntCastOperations.microDouble256ToByte256 | 1024 | 148.878 | 3082.034 | 20.70174237 > VectorFPtoIntCastOperations.microDouble256ToByte512 | 1024 | 50.108 | 51.629 | 1.030354434 > VectorFPtoIntCastOperations.microDouble256ToInteger128 | 1024 | 159.805 | 4619.421 | 28.90661118 > VectorFPtoIntCastOperations.microDouble256ToInteger256 | 1024 | 143.876 | 4649.642 | 32.31700909 > VectorFPtoIntCastOperations.microDouble256ToInteger512 | 1024 | 38.127 | 38.188 | 1.001599916 > VectorFPtoIntCastOperations.microDouble256ToLong128 | 1024 | 160.322 | 162.442 | 1.013223388 > VectorFPtoIntCastOperations.microDouble256ToLong256 | 1024 | 141.252 | 143.01 | 1.012445841 > VectorFPtoIntCastOperations.microDouble256ToShort128 | 1024 | 157.717 | 3757.471 | 23.82413437 > VectorFPtoIntCastOperations.microDouble256ToShort256 | 1024 | 143.876 | 3830.971 | 26.62689399 > VectorFPtoIntCastOperations.microDouble256ToShort512 | 1024 | 32.061 | 32.911 | 1.026511962 > VectorFPtoIntCastOperations.microFloat128ToByte128 | 1024 | 146.599 | 4002.967 | 27.30555461 > VectorFPtoIntCastOperations.microFloat128ToByte256 | 1024 | 136.99 | 3938.799 | 28.75245638 > VectorFPtoIntCastOperations.microFloat128ToByte512 | 1024 | 51.561 | 50.284 | 0.975233219 > VectorFPtoIntCastOperations.microFloat128ToInteger128 | 1024 | 5933.565 | 5361.472 | 0.903583596 > VectorFPtoIntCastOperations.microFloat128ToInteger256 | 1024 | 5079.564 | 5062.046 | 0.996551279 > VectorFPtoIntCastOperations.microFloat128ToInteger512 | 1024 | 37.101 | 38.419 | 1.035524649 > VectorFPtoIntCastOperations.microFloat128ToLong128 | 1024 | 145.863 | 145.362 | 0.99656527 > VectorFPtoIntCastOperations.microFloat128ToLong256 | 1024 | 131.159 | 133.154 | 1.015210546 > VectorFPtoIntCastOperations.microFloat128ToShort128 | 1024 | 145.966 | 4150.039 | 28.4315457 > VectorFPtoIntCastOperations.microFloat128ToShort256 | 1024 | 134.703 | 4566.589 | 33.90116775 > VectorFPtoIntCastOperations.microFloat128ToShort512 | 1024 | 31.878 | 30.867 | 0.968285338 > VectorFPtoIntCastOperations.microFloat256ToByte128 | 1024 | 237.841 | 6292.051 | 26.4548627 > VectorFPtoIntCastOperations.microFloat256ToByte256 | 1024 | 222.041 | 6292.748 | 28.34047766 > VectorFPtoIntCastOperations.microFloat256ToByte512 | 1024 | 92.073 | 88.981 | 0.966417951 > VectorFPtoIntCastOperations.microFloat256ToInteger128 | 1024 | 11471.121 | 10269.636 | 0.895260019 > VectorFPtoIntCastOperations.microFloat256ToInteger256 | 1024 | 10729.816 | 10105.92 | 0.941853989 > VectorFPtoIntCastOperations.microFloat256ToInteger512 | 1024 | 68.328 | 70.005 | 1.024543379 > VectorFPtoIntCastOperations.microFloat256ToLong128 | 1024 | 247.101 | 248.571 | 1.005948984 > VectorFPtoIntCastOperations.microFloat256ToLong256 | 1024 | 225.74 | 223.987 | 0.992234429 > VectorFPtoIntCastOperations.microFloat256ToLong512 | 1024 | 76.39 | 76.187 | 0.997342584 > VectorFPtoIntCastOperations.microFloat256ToShort128 | 1024 | 233.196 | 8202.179 | 35.17289748 > VectorFPtoIntCastOperations.microFloat256ToShort256 | 1024 | 220.75 | 7781.073 | 35.24834881 > VectorFPtoIntCastOperations.microFloat256ToShort512 | 1024 | 58.143 | 55.633 | 0.956830573 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - 8288043: Adding a descriptive comment for removing explicit scratch registers needed to load stub constants. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8288043 - 8288043: Adding a descriptive comment. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8288043 - 8288043: Changing file permission. - 8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms ------------- Changes: https://git.openjdk.org/jdk/pull/9748/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9748&range=01 Stats: 1028 lines in 8 files changed: 751 ins; 65 del; 212 mod Patch: https://git.openjdk.org/jdk/pull/9748.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9748/head:pull/9748 PR: https://git.openjdk.org/jdk/pull/9748 From fgao at openjdk.org Tue Sep 6 02:03:06 2022 From: fgao at openjdk.org (Fei Gao) Date: Tue, 6 Sep 2022 02:03:06 GMT Subject: RFR: 8290910: Wrong memory state is picked in SuperWord::co_locate_pack() [v3] In-Reply-To: References: Message-ID: > After [JDK-8283091](https://bugs.openjdk.org/browse/JDK-8283091), the loop below can be vectorized partially. > Statement 1 can be vectorized but statement 2 can't. > > // int[] iArr; long[] lArrFld; int i1,i2; > for (i1 = 6; i1 < 227; i1++) { > iArr[i1] += lArrFld[i1]++; // statement 1 > iArr[i1 + 1] -= (i2++); // statement 2 > } > > > But we got incorrect results because the vector packs of iArr are > scheduled incorrectly like: > > ... > load_vector XMM1,[R8 + #16 + R11 << #2] > movl RDI, [R8 + #20 + R11 << #2] # int > load_vector XMM2,[R9 + #8 + R11 << #3] > subl RDI, R11 # int > vpaddq XMM3,XMM2,XMM0 ! add packedL > store_vector [R9 + #8 + R11 << #3],XMM3 > vector_cast_l2x XMM2,XMM2 ! > vpaddd XMM1,XMM2,XMM1 ! add packedI > addl RDI, #228 # int > movl [R8 + #20 + R11 << #2], RDI # int > movl RBX, [R8 + #24 + R11 << #2] # int > subl RBX, R11 # int > addl RBX, #227 # int > movl [R8 + #24 + R11 << #2], RBX # int > ... > movl RBX, [R8 + #40 + R11 << #2] # int > subl RBX, R11 # int > addl RBX, #223 # int > movl [R8 + #40 + R11 << #2], RBX # int > movl RDI, [R8 + #44 + R11 << #2] # int > subl RDI, R11 # int > addl RDI, #222 # int > movl [R8 + #44 + R11 << #2], RDI # int > store_vector [R8 + #16 + R11 << #2],XMM1 > ... > > simplified as: > > load_vector iArr in statement 1 > unvectorized loads/stores in statement 2 > store_vector iArr in statement 1 > > We cannot pick the memory state from the first load for LoadI pack > here, as the LoadI vector operation must load the new values in memory > after iArr writes `iArr[i1 + 1] - (i2++)` to `iArr[i1 + 1]`(statement 2). > We must take the memory state of the last load where we have assigned > new values `iArr[i1 + 1] - (i2++)` to the iArr array. > > In [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), we picked the memory state of the first load[1]. Different > from the scenario in [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), the store, which is dependent on an > earlier load here, is in a pack to be scheduled and the LoadI pack > depends on the last_mem. As designed[2], to schedule the StoreI pack, > all memory operations in another single pack should be moved in the same > direction. We know that the store in the pack depends on one of loads in > the LoadI pack, so the LoadI pack should be scheduled before the StoreI > pack. And the LoadI pack depends on the last_mem, so the last_mem must > be scheduled before the LoadI pack and also before the store pack. > Therefore, we need to take the memory state of the last load for the > LoadI pack here. > > To fix it, the pack adds additional checks while picking the memory state > of the first load. When the store locates in a pack and the load pack > relies on the last_mem, we shouldn't choose the memory state of the > first load but choose the memory state of the last load. > > [1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380 > [2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232 Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Fix the interleaving cases using as index offset and add new reduced case from JDK-8293216 Change-Id: Ia20009e262e49ef0d6096133f00acad614b4a1dc - Merge branch 'master' into fg8290910 Change-Id: I2393a1f4f744b2ed258803c82f3198c2f2e5a8ac - Code style change: add one space Change-Id: I2794060ac0f9dbe006e32f202111ee08f09d96a1 - 8290910: Wrong memory state is picked in SuperWord::co_locate_pack() After JDK-8283091, the loop below can be vectorized partially. Statement 1 can be vectorized but statement 2 can't. ``` // int[] iArr; long[] lArrFld; int i1,i2; for (i1 = 6; i1 < 227; i1++) { iArr[i1] += lArrFld[i1]++; // statement 1 iArr[i1 + 1] -= (i2++); // statement 2 } ``` But we got incorrect results because the vector packs of iArr are scheduled incorrectly like: ``` ... load_vector XMM1,[R8 + #16 + R11 << #2] movl RDI, [R8 + #20 + R11 << #2] # int load_vector XMM2,[R9 + #8 + R11 << #3] subl RDI, R11 # int vpaddq XMM3,XMM2,XMM0 ! add packedL store_vector [R9 + #8 + R11 << #3],XMM3 vector_cast_l2x XMM2,XMM2 ! vpaddd XMM1,XMM2,XMM1 ! add packedI addl RDI, #228 # int movl [R8 + #20 + R11 << #2], RDI # int movl RBX, [R8 + #24 + R11 << #2] # int subl RBX, R11 # int addl RBX, #227 # int movl [R8 + #24 + R11 << #2], RBX # int ... movl RBX, [R8 + #40 + R11 << #2] # int subl RBX, R11 # int addl RBX, #223 # int movl [R8 + #40 + R11 << #2], RBX # int movl RDI, [R8 + #44 + R11 << #2] # int subl RDI, R11 # int addl RDI, #222 # int movl [R8 + #44 + R11 << #2], RDI # int store_vector [R8 + #16 + R11 << #2],XMM1 ... ``` simplified as: ``` load_vector iArr in statement 1 unvectorized loads/stores in statement 2 store_vector iArr in statement 1 ``` We cannot pick the memory state from the first load for LoadI pack here, as the LoadI vector operation must load the new values in memory after iArr writes 'iArr[i1 + 1] - (i2++)' to 'iArr[i1 + 1]'(statement 2). We must take the memory state of the last load where we have assigned new values ('iArr[i1 + 1] - (i2++)') to the iArr array. In JDK-8240281, we picked the memory state of the first load. Different from the scenario in JDK-8240281, the store, which is dependent on an earlier load here, is in a pack to be scheduled and the LoadI pack depends on the last_mem. As designed[2], to schedule the StoreI pack, all memory operations in another single pack should be moved in the same direction. We know that the store in the pack depends on one of loads in the LoadI pack, so the LoadI pack should be scheduled before the StoreI pack. And the LoadI pack depends on the last_mem, so the last_mem must be scheduled before the LoadI pack and also before the store pack. Therefore, we need to take the memory state of the last load for the LoadI pack here. To fix it, the pack adds additional checks while picking the memory state of the first load. When the store locates in a pack and the load pack relies on the last_mem, we shouldn't choose the memory state of the first load but choose the memory state of the last load. [1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380 [2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232 Jira: ENTLLT-5482 Change-Id: I341d10b91957b60a1b4aff8116723e54083a5fb8 CustomizedGitHooks: yes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9898/files - new: https://git.openjdk.org/jdk/pull/9898/files/01d64113..c733f039 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9898&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9898&range=01-02 Stats: 89280 lines in 1321 files changed: 36710 ins; 41329 del; 11241 mod Patch: https://git.openjdk.org/jdk/pull/9898.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9898/head:pull/9898 PR: https://git.openjdk.org/jdk/pull/9898 From fgao at openjdk.org Tue Sep 6 02:05:55 2022 From: fgao at openjdk.org (Fei Gao) Date: Tue, 6 Sep 2022 02:05:55 GMT Subject: RFR: 8290910: Wrong memory state is picked in SuperWord::co_locate_pack() [v2] In-Reply-To: References: <5VdJz-Y2_RAqlUjtke3COI2hv3f0ClDB9nA1F__dE1c=.e373a3db-0c64-4ee8-85a7-0b6692ce1d4e@github.com> Message-ID: On Tue, 30 Aug 2022 03:32:38 GMT, Vladimir Kozlov wrote: > Can you show assembler after this fix? > > Would be interest to see results for other interleaving cases: > > ``` > a[i-1] += ; // similar to your case > a[i] += ; > ``` > > ``` > a[i+1] += ; > a[i] += ; > ``` > > ``` > a[i] += ; > a[i-1] += ; > ``` > > Also when `+2` is used instead of `+1`. Or `+`. @vnkozlov Thanks for your review. I updated my patch and will illustrate the motivation for the new change in Part two. ============================================================= Part One: The assembly code of my case after this fix is like: LOOP: movl RDX, RBP movl RDI, [R8 + #20 + RDX << #2] # int load_vector XMM1,[RCX + #8 + RDX << #3] subl RDI, RDX # int vpaddq XMM2,XMM1,XMM0 ! add packedL store_vector [RCX + #8 + RDX << #3],XMM2 vector_cast_l2x XMM1,XMM1 ! addl RDI, #228 # int movl [R8 + #20 + RDX << #2], RDI # int movl RBX, [R8 + #24 + RDX << #2] # int subl RBX, RDX # int addl RBX, #227 # int movl [R8 + #24 + RDX << #2], RBX # int movl RDI, [R8 + #28 + RDX << #2] # int subl RDI, RDX # int addl RDI, #226 # int movl [R8 + #28 + RDX << #2], RDI # int movl RBX, [R8 + #32 + RDX << #2] # int subl RBX, RDX # int addl RBX, #225 # int movl [R8 + #32 + RDX << #2], RBX # int movl RDI, [R8 + #36 + RDX << #2] # int subl RDI, RDX # int addl RDI, #224 # int movl [R8 + #36 + RDX << #2], RDI # int movl RBX, [R8 + #40 + RDX << #2] # int subl RBX, RDX # int addl RBX, #223 # int movl [R8 + #40 + RDX << #2], RBX # int movl RDI, [R8 + #44 + RDX << #2] # int subl RDI, RDX # int addl RDI, #222 # int movl [R8 + #44 + RDX << #2], RDI # int vpaddd XMM1,XMM1,[R8 + #16 + RDX << #2] ! add packedI store_vector [R8 + #16 + RDX << #2],XMM1 movl RBX, [R8 + #48 + RDX << #2] # int subl RBX, RDX # int addl RBX, #221 # int movl [R8 + #48 + RDX << #2], RBX # int movl RBP, RDX # spill addl RBP, #8 # int cmpl RBP, #220 jl LOOP Now the LoadI pack is scheduled after `iArr[i1 + 3] -= (i2++);` where we have assigned necessary new values to the iArr array, and the `load_vector` is combined with `AddVI` into `vpaddd` here. For the case similar to my case above: a[i-1] += ; a[i] += ; The problem is the same and can be fixed by my patch. The fixed assembly code is like the case above. ============================================================= Part Two: When using `+2` or other index offsets instead of `1`, which can't be divided exactly by `pack_size`, then the case is like a[i] += ; // statement 1 a[i+2] += ; or like: a[i-2] += ; a[i] += ; ``` there is still the same problem with vector pack scheduling mentioned above, but the old patch can't fix it. That's because, in the old patch, we thought only when the `LoadI` pack depends on `last_mem`, we shouldn?t take the memory state of the first load. But we can see here, after unrolling 4 times, the case with index 2 will be: a[i] += ;// statement 1 a[i+2] += ; a[i+1] += ;// statement 1 a[i+3] += ; a[i+2] += ;// statement 1 a[i+4] += ; a[i+3] += ;// statement 1 a[i+5] += ; If `pack_size` is `4`, then the last_mem of `LoadI` pack of statement 1 will be `StoreI` in ` a[i+4] += ;`, there is no dependency between the `LoadI` pack and ` a[i+4] += ;`. The value of `is_dependent` would be `false` and we still take the memory state of first load wrongly. The reason why we shouldn?t take the memory state of first load is that there is strong dependency between ` a[i+2] += ` in the `LoadI` pack and ` a[i+2] += ;`, which locates after first_mem and before last_mem. Therefore, I suppose when we try to determine if we should take the memory state of the first load, we should consider if the `LoadI` pack depends on any memory operations locating after `first_mem`. In the newest patch, I discard the new function ` dependent_on_last_mem()` and make use of the function ` find_last_mem_state()` to find if any load in the `LoadI`pack depends on any memory operations locating after `first_mem`. If yes, the `loadI` pack should be scheduled after the relied memory operation. In this series of interleaving cases, the store in the pack depends on one of loads in the `LoadI` pack, so the `LoadI` pack should be scheduled before the `StoreI` pack. And some loads in the `LoadI` pack depends on the some early memory operations locating after `first_mem`, so the relied memory operation should be scheduled before the `LoadI` pack and also before the store pack. In this way, we shouldn?t take the memory state of the first load. To be concluded, when the load pack depends on some memory operations locating after `first_mem` and the store which depends on the load pack is in a vectorized pack, we still take the memory state of the last load. ============================================================= Part Three: For the case a[i+1] += ; a[i] += ; and the case a[i] += ; a[i-1] += ; , both can't be vectorized because of issue of alignment and will be scheduled as scalar instructions. `+2` or other index offsets which can't be divided exactly by `pack_size` have the same issue of alignment and can?t be vectorized. ============================================================= Part Four: We won't have the problem when using `pack_size` like `4` as index offset (set `MaxVectorSize=16`). But in the case iArr[i1-4] += (lArrFld[i1]++); iArr[i1] -= (i2++); we use long array and int array at the same time and superword can't vectorize them because of different pack size. So I amend `lArrFld` to an int array. The cases are changed to // int[] a; iArr[i1-4] += (a[i1]++); iArr[i1] -= (i2++); ``` // int[] a; iArr[i1] += (a[i1]++); iArr[i1+4] -= (i2++); // int[] a; iArr[i1+4] += (a[i1]++); iArr[i1] -= (i2++); // int[] a; iArr[i1] += (a[i1]++); iArr[i1-4] -= (i2++); All of them can be vectorized but only the first statement can be vectorized as well. Currently, the memory dependency among vector packs and scalar nodes is much simpler than my case because there is no dependency among sandwiched stores and `LoadI` packs. Vector operations are scheduled before all scalar memory operations as the order of statements in the java code loop. Their assembly code is similar and won't be influenced by my patch, showed as below: ... load_vector XMM1,[RSI + #16 + R9 << #2] load_vector XMM2,[RCX + #16 + R9 << #2] vpaddd XMM3,XMM2,XMM0 ! add packedI store_vector [RCX + #16 + R9 << #2],XMM3 vpaddd XMM1,XMM1,XMM2 ! add packedI store_vector [RSI + #16 + R9 << #2],XMM1 movl R11, [RSI + #32 + R9 << #2] # int subl R11, R9 # int addl R11, #228 # int movl [RSI + #32 + R9 << #2], R11 # int movl R10, [RSI + #36 + R9 << #2] # int subl R10, R9 # int addl R10, #227 # int movl [RSI + #36 + R9 << #2], R10 # int movl R10, [RSI + #40 + R9 << #2] # int subl R10, R9 # int addl R10, #226 # int movl [RSI + #40 + R9 << #2], R10 # int movl R11, [RSI + #44 + R9 << #2] # int subl R11, R9 # int addl R11, #225 # int movl [RSI + #44 + R9 << #2], R11 # int ... ------------- PR: https://git.openjdk.org/jdk/pull/9898 From fgao at openjdk.org Tue Sep 6 02:10:49 2022 From: fgao at openjdk.org (Fei Gao) Date: Tue, 6 Sep 2022 02:10:49 GMT Subject: RFR: 8290910: Wrong memory state is picked in SuperWord::co_locate_pack() [v3] In-Reply-To: References: Message-ID: On Tue, 6 Sep 2022 02:03:06 GMT, Fei Gao wrote: >> After [JDK-8283091](https://bugs.openjdk.org/browse/JDK-8283091), the loop below can be vectorized partially. >> Statement 1 can be vectorized but statement 2 can't. >> >> // int[] iArr; long[] lArrFld; int i1,i2; >> for (i1 = 6; i1 < 227; i1++) { >> iArr[i1] += lArrFld[i1]++; // statement 1 >> iArr[i1 + 1] -= (i2++); // statement 2 >> } >> >> >> But we got incorrect results because the vector packs of iArr are >> scheduled incorrectly like: >> >> ... >> load_vector XMM1,[R8 + #16 + R11 << #2] >> movl RDI, [R8 + #20 + R11 << #2] # int >> load_vector XMM2,[R9 + #8 + R11 << #3] >> subl RDI, R11 # int >> vpaddq XMM3,XMM2,XMM0 ! add packedL >> store_vector [R9 + #8 + R11 << #3],XMM3 >> vector_cast_l2x XMM2,XMM2 ! >> vpaddd XMM1,XMM2,XMM1 ! add packedI >> addl RDI, #228 # int >> movl [R8 + #20 + R11 << #2], RDI # int >> movl RBX, [R8 + #24 + R11 << #2] # int >> subl RBX, R11 # int >> addl RBX, #227 # int >> movl [R8 + #24 + R11 << #2], RBX # int >> ... >> movl RBX, [R8 + #40 + R11 << #2] # int >> subl RBX, R11 # int >> addl RBX, #223 # int >> movl [R8 + #40 + R11 << #2], RBX # int >> movl RDI, [R8 + #44 + R11 << #2] # int >> subl RDI, R11 # int >> addl RDI, #222 # int >> movl [R8 + #44 + R11 << #2], RDI # int >> store_vector [R8 + #16 + R11 << #2],XMM1 >> ... >> >> simplified as: >> >> load_vector iArr in statement 1 >> unvectorized loads/stores in statement 2 >> store_vector iArr in statement 1 >> >> We cannot pick the memory state from the first load for LoadI pack >> here, as the LoadI vector operation must load the new values in memory >> after iArr writes `iArr[i1 + 1] - (i2++)` to `iArr[i1 + 1]`(statement 2). >> We must take the memory state of the last load where we have assigned >> new values `iArr[i1 + 1] - (i2++)` to the iArr array. >> >> In [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), we picked the memory state of the first load[1]. Different >> from the scenario in [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), the store, which is dependent on an >> earlier load here, is in a pack to be scheduled and the LoadI pack >> depends on the last_mem. As designed[2], to schedule the StoreI pack, >> all memory operations in another single pack should be moved in the same >> direction. We know that the store in the pack depends on one of loads in >> the LoadI pack, so the LoadI pack should be scheduled before the StoreI >> pack. And the LoadI pack depends on the last_mem, so the last_mem must >> be scheduled before the LoadI pack and also before the store pack. >> Therefore, we need to take the memory state of the last load for the >> LoadI pack here. >> >> To fix it, the pack adds additional checks while picking the memory state >> of the first load. When the store locates in a pack and the load pack >> relies on the last_mem, we shouldn't choose the memory state of the >> first load but choose the memory state of the last load. >> >> [1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380 >> [2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232 > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Fix the interleaving cases using as index offset and add new reduced case from JDK-8293216 > > Change-Id: Ia20009e262e49ef0d6096133f00acad614b4a1dc > - Merge branch 'master' into fg8290910 > > Change-Id: I2393a1f4f744b2ed258803c82f3198c2f2e5a8ac > - Code style change: add one space > > Change-Id: I2794060ac0f9dbe006e32f202111ee08f09d96a1 > - 8290910: Wrong memory state is picked in SuperWord::co_locate_pack() > > After JDK-8283091, the loop below can be vectorized partially. > Statement 1 can be vectorized but statement 2 can't. > ``` > // int[] iArr; long[] lArrFld; int i1,i2; > for (i1 = 6; i1 < 227; i1++) { > iArr[i1] += lArrFld[i1]++; // statement 1 > iArr[i1 + 1] -= (i2++); // statement 2 > } > ``` > > But we got incorrect results because the vector packs of iArr are > scheduled incorrectly like: > ``` > ... > load_vector XMM1,[R8 + #16 + R11 << #2] > movl RDI, [R8 + #20 + R11 << #2] # int > load_vector XMM2,[R9 + #8 + R11 << #3] > subl RDI, R11 # int > vpaddq XMM3,XMM2,XMM0 ! add packedL > store_vector [R9 + #8 + R11 << #3],XMM3 > vector_cast_l2x XMM2,XMM2 ! > vpaddd XMM1,XMM2,XMM1 ! add packedI > addl RDI, #228 # int > movl [R8 + #20 + R11 << #2], RDI # int > movl RBX, [R8 + #24 + R11 << #2] # int > subl RBX, R11 # int > addl RBX, #227 # int > movl [R8 + #24 + R11 << #2], RBX # int > ... > movl RBX, [R8 + #40 + R11 << #2] # int > subl RBX, R11 # int > addl RBX, #223 # int > movl [R8 + #40 + R11 << #2], RBX # int > movl RDI, [R8 + #44 + R11 << #2] # int > subl RDI, R11 # int > addl RDI, #222 # int > movl [R8 + #44 + R11 << #2], RDI # int > store_vector [R8 + #16 + R11 << #2],XMM1 > ... > ``` > simplified as: > ``` > load_vector iArr in statement 1 > unvectorized loads/stores in statement 2 > store_vector iArr in statement 1 > ``` > We cannot pick the memory state from the first load for LoadI pack > here, as the LoadI vector operation must load the new values in memory > after iArr writes 'iArr[i1 + 1] - (i2++)' to 'iArr[i1 + 1]'(statement 2). > We must take the memory state of the last load where we have assigned > new values ('iArr[i1 + 1] - (i2++)') to the iArr array. > > In JDK-8240281, we picked the memory state of the first load. Different > from the scenario in JDK-8240281, the store, which is dependent on an > earlier load here, is in a pack to be scheduled and the LoadI pack > depends on the last_mem. As designed[2], to schedule the StoreI pack, > all memory operations in another single pack should be moved in the same > direction. We know that the store in the pack depends on one of loads in > the LoadI pack, so the LoadI pack should be scheduled before the StoreI > pack. And the LoadI pack depends on the last_mem, so the last_mem must > be scheduled before the LoadI pack and also before the store pack. > Therefore, we need to take the memory state of the last load for the > LoadI pack here. > > To fix it, the pack adds additional checks while picking the memory state > of the first load. When the store locates in a pack and the load pack > relies on the last_mem, we shouldn't choose the memory state of the > first load but choose the memory state of the last load. > > [1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380 > [2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232 > > Jira: ENTLLT-5482 > Change-Id: I341d10b91957b60a1b4aff8116723e54083a5fb8 > CustomizedGitHooks: yes I also updated the testcase by adding new interleaving cases with different offset, and adding the reduced testcase from [JDK-8293216](https://bugs.openjdk.org/browse/JDK-8293216), which is a dup of the problem. ------------- PR: https://git.openjdk.org/jdk/pull/9898 From fgao at openjdk.org Tue Sep 6 02:29:56 2022 From: fgao at openjdk.org (Fei Gao) Date: Tue, 6 Sep 2022 02:29:56 GMT Subject: RFR: 8290910: Wrong memory state is picked in SuperWord::co_locate_pack() [v3] In-Reply-To: References: Message-ID: On Tue, 6 Sep 2022 02:03:06 GMT, Fei Gao wrote: >> After [JDK-8283091](https://bugs.openjdk.org/browse/JDK-8283091), the loop below can be vectorized partially. >> Statement 1 can be vectorized but statement 2 can't. >> >> // int[] iArr; long[] lArrFld; int i1,i2; >> for (i1 = 6; i1 < 227; i1++) { >> iArr[i1] += lArrFld[i1]++; // statement 1 >> iArr[i1 + 1] -= (i2++); // statement 2 >> } >> >> >> But we got incorrect results because the vector packs of iArr are >> scheduled incorrectly like: >> >> ... >> load_vector XMM1,[R8 + #16 + R11 << #2] >> movl RDI, [R8 + #20 + R11 << #2] # int >> load_vector XMM2,[R9 + #8 + R11 << #3] >> subl RDI, R11 # int >> vpaddq XMM3,XMM2,XMM0 ! add packedL >> store_vector [R9 + #8 + R11 << #3],XMM3 >> vector_cast_l2x XMM2,XMM2 ! >> vpaddd XMM1,XMM2,XMM1 ! add packedI >> addl RDI, #228 # int >> movl [R8 + #20 + R11 << #2], RDI # int >> movl RBX, [R8 + #24 + R11 << #2] # int >> subl RBX, R11 # int >> addl RBX, #227 # int >> movl [R8 + #24 + R11 << #2], RBX # int >> ... >> movl RBX, [R8 + #40 + R11 << #2] # int >> subl RBX, R11 # int >> addl RBX, #223 # int >> movl [R8 + #40 + R11 << #2], RBX # int >> movl RDI, [R8 + #44 + R11 << #2] # int >> subl RDI, R11 # int >> addl RDI, #222 # int >> movl [R8 + #44 + R11 << #2], RDI # int >> store_vector [R8 + #16 + R11 << #2],XMM1 >> ... >> >> simplified as: >> >> load_vector iArr in statement 1 >> unvectorized loads/stores in statement 2 >> store_vector iArr in statement 1 >> >> We cannot pick the memory state from the first load for LoadI pack >> here, as the LoadI vector operation must load the new values in memory >> after iArr writes `iArr[i1 + 1] - (i2++)` to `iArr[i1 + 1]`(statement 2). >> We must take the memory state of the last load where we have assigned >> new values `iArr[i1 + 1] - (i2++)` to the iArr array. >> >> In [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), we picked the memory state of the first load[1]. Different >> from the scenario in [JDK-8240281](https://bugs.openjdk.org/browse/JDK-8240281), the store, which is dependent on an >> earlier load here, is in a pack to be scheduled and the LoadI pack >> depends on the last_mem. As designed[2], to schedule the StoreI pack, >> all memory operations in another single pack should be moved in the same >> direction. We know that the store in the pack depends on one of loads in >> the LoadI pack, so the LoadI pack should be scheduled before the StoreI >> pack. And the LoadI pack depends on the last_mem, so the last_mem must >> be scheduled before the LoadI pack and also before the store pack. >> Therefore, we need to take the memory state of the last load for the >> LoadI pack here. >> >> To fix it, the pack adds additional checks while picking the memory state >> of the first load. When the store locates in a pack and the load pack >> relies on the last_mem, we shouldn't choose the memory state of the >> first load but choose the memory state of the last load. >> >> [1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380 >> [2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232 > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Fix the interleaving cases using as index offset and add new reduced case from JDK-8293216 > > Change-Id: Ia20009e262e49ef0d6096133f00acad614b4a1dc > - Merge branch 'master' into fg8290910 > > Change-Id: I2393a1f4f744b2ed258803c82f3198c2f2e5a8ac > - Code style change: add one space > > Change-Id: I2794060ac0f9dbe006e32f202111ee08f09d96a1 > - 8290910: Wrong memory state is picked in SuperWord::co_locate_pack() > > After JDK-8283091, the loop below can be vectorized partially. > Statement 1 can be vectorized but statement 2 can't. > ``` > // int[] iArr; long[] lArrFld; int i1,i2; > for (i1 = 6; i1 < 227; i1++) { > iArr[i1] += lArrFld[i1]++; // statement 1 > iArr[i1 + 1] -= (i2++); // statement 2 > } > ``` > > But we got incorrect results because the vector packs of iArr are > scheduled incorrectly like: > ``` > ... > load_vector XMM1,[R8 + #16 + R11 << #2] > movl RDI, [R8 + #20 + R11 << #2] # int > load_vector XMM2,[R9 + #8 + R11 << #3] > subl RDI, R11 # int > vpaddq XMM3,XMM2,XMM0 ! add packedL > store_vector [R9 + #8 + R11 << #3],XMM3 > vector_cast_l2x XMM2,XMM2 ! > vpaddd XMM1,XMM2,XMM1 ! add packedI > addl RDI, #228 # int > movl [R8 + #20 + R11 << #2], RDI # int > movl RBX, [R8 + #24 + R11 << #2] # int > subl RBX, R11 # int > addl RBX, #227 # int > movl [R8 + #24 + R11 << #2], RBX # int > ... > movl RBX, [R8 + #40 + R11 << #2] # int > subl RBX, R11 # int > addl RBX, #223 # int > movl [R8 + #40 + R11 << #2], RBX # int > movl RDI, [R8 + #44 + R11 << #2] # int > subl RDI, R11 # int > addl RDI, #222 # int > movl [R8 + #44 + R11 << #2], RDI # int > store_vector [R8 + #16 + R11 << #2],XMM1 > ... > ``` > simplified as: > ``` > load_vector iArr in statement 1 > unvectorized loads/stores in statement 2 > store_vector iArr in statement 1 > ``` > We cannot pick the memory state from the first load for LoadI pack > here, as the LoadI vector operation must load the new values in memory > after iArr writes 'iArr[i1 + 1] - (i2++)' to 'iArr[i1 + 1]'(statement 2). > We must take the memory state of the last load where we have assigned > new values ('iArr[i1 + 1] - (i2++)') to the iArr array. > > In JDK-8240281, we picked the memory state of the first load. Different > from the scenario in JDK-8240281, the store, which is dependent on an > earlier load here, is in a pack to be scheduled and the LoadI pack > depends on the last_mem. As designed[2], to schedule the StoreI pack, > all memory operations in another single pack should be moved in the same > direction. We know that the store in the pack depends on one of loads in > the LoadI pack, so the LoadI pack should be scheduled before the StoreI > pack. And the LoadI pack depends on the last_mem, so the last_mem must > be scheduled before the LoadI pack and also before the store pack. > Therefore, we need to take the memory state of the last load for the > LoadI pack here. > > To fix it, the pack adds additional checks while picking the memory state > of the first load. When the store locates in a pack and the load pack > relies on the last_mem, we shouldn't choose the memory state of the > first load but choose the memory state of the last load. > > [1]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2380 > [2]https://github.com/openjdk/jdk/blob/0ae834105740f7cf73fe96be22e0f564ad29b18d/src/hotspot/share/opto/superword.cpp#L2232 > > Jira: ENTLLT-5482 > Change-Id: I341d10b91957b60a1b4aff8116723e54083a5fb8 > CustomizedGitHooks: yes Forget to add the assembly code for the interleaving cases with index offset `2`: a[i] += ; // statement 1 a[i+2] += ; The wrong assembly code is : B34: movl RDI, R11 # spill B35: load_vector XMM1,[RSI + #16 + RDI << #2] movl R10, [RSI + #24 + RDI << #2] # int load_vector XMM2,[RBX + #16 + RDI << #3] subl R10, RDI # int vpaddq XMM3,XMM2,XMM0 ! add packedL store_vector [RBX + #16 + RDI << #3],XMM3 vector_cast_l2x XMM2,XMM2 ! vpaddd XMM1,XMM2,XMM1 ! add packedI addl R10, #228 # int movl [RSI + #24 + RDI << #2], R10 # int movl R11, [RSI + #28 + RDI << #2] # int subl R11, RDI # int addl R11, #227 # int movl [RSI + #28 + RDI << #2], R11 # int movl R10, [RSI + #32 + RDI << #2] # int subl R10, RDI # int addl R10, #226 # int movl [RSI + #32 + RDI << #2], R10 # int movl R11, [RSI + #36 + RDI << #2] # int subl R11, RDI # int addl R11, #225 # int movl [RSI + #36 + RDI << #2], R11 # int movl R10, [RSI + #40 + RDI << #2] # int subl R10, RDI # int addl R10, #224 # int movl [RSI + #40 + RDI << #2], R10 # int movl R10, [RSI + #44 + RDI << #2] # int subl R10, RDI # int addl R10, #223 # int movl [RSI + #44 + RDI << #2], R10 # int store_vector [RSI + #16 + RDI << #2],XMM1 movl R11, [RSI + #48 + RDI << #2] # int subl R11, RDI # int addl R11, #222 # int movl [RSI + #48 + RDI << #2], R11 # int movl R10, [RSI + #52 + RDI << #2] # int subl R10, RDI # int addl R10, #221 # int movl [RSI + #52 + RDI << #2], R10 # int movl R11, RDI # spill addl R11, #8 # int cmpl R11, #220 jl B34 After the new patch, the fixed assembly code is: B34: movl RDI, R11 # spill B35: movl R10, [RSI + #24 + RDI << #2] # int load_vector XMM1,[RBX + #16 + RDI << #3] subl R10, RDI # int vpaddq XMM2,XMM1,XMM0 ! add packedL store_vector [RBX + #16 + RDI << #3],XMM2 vector_cast_l2x XMM1,XMM1 ! addl R10, #228 # int movl [RSI + #24 + RDI << #2], R10 # int movl R11, [RSI + #28 + RDI << #2] # int subl R11, RDI # int addl R11, #227 # int movl [RSI + #28 + RDI << #2], R11 # int movl R10, [RSI + #32 + RDI << #2] # int subl R10, RDI # int addl R10, #226 # int movl [RSI + #32 + RDI << #2], R10 # int movl R11, [RSI + #36 + RDI << #2] # int subl R11, RDI # int addl R11, #225 # int movl [RSI + #36 + RDI << #2], R11 # int movl R10, [RSI + #40 + RDI << #2] # int subl R10, RDI # int addl R10, #224 # int movl [RSI + #40 + RDI << #2], R10 # int movl R10, [RSI + #44 + RDI << #2] # int subl R10, RDI # int addl R10, #223 # int movl [RSI + #44 + RDI << #2], R10 # int vpaddd XMM1,XMM1,[RSI + #16 + RDI << #2] ! add packedI store_vector [RSI + #16 + RDI << #2],XMM1 movl R11, [RSI + #48 + RDI << #2] # int subl R11, RDI # int addl R11, #222 # int movl [RSI + #48 + RDI << #2], R11 # int movl R10, [RSI + #52 + RDI << #2] # int subl R10, RDI # int addl R10, #221 # int movl [RSI + #52 + RDI << #2], R10 # int movl R11, RDI # spill addl R11, #8 # int cmpl R11, #220 jl B34 ------------- PR: https://git.openjdk.org/jdk/pull/9898 From xgong at openjdk.org Tue Sep 6 02:36:46 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 6 Sep 2022 02:36:46 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v6] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> Message-ID: On Wed, 31 Aug 2022 06:10:07 GMT, Xiaohong Gong wrote: >> Recently we found the performance of "`FIRST_NONZERO`" for double type is largely worse than the other types on x86 when `UseAVX=2`. The main reason is the "`VectorCastL2X`" op is not supported by the backend when the dst element type is `T_DOUBLE`. This makes the check of `VectorCast` op fail before intrinsifying "`VectorMask.cast()`" which is used in the >> "`FIRST_NONZERO`" java implementation (see [1]). However, the compiler will not generate the `VectorCast `op for `VectorMask.cast()` if: >> >> 1) the current platform supports the predicated feature >> 2) the element size (in bytes) of the src and dst type is the same >> >> So the check of "`VectorCast`" op is needless for such cases. To fix it, this patch: >> >> 1) limits the specified vector cast op check to vectors >> 2) adds the relative mask cast op check for VectorMask.cast() >> 3) cleans up the unnecessary codes >> >> Here is the performance of "`FIRST_NONZERO`" benchmark [2] on a x86 machine with `UseAVX=2`: >> >> Benchmark (size) Mode Cnt Before After Units >> DoubleMaxVector.FIRST_NONZERO 1024 thrpt 15 49.266 2460.886 ops/ms >> DoubleMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 49.554 1892.223 ops/ms >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java#L770 >> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L246 > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments ping again. Could anyone else please help to take a look at this PR? Thanks a lot! ------------- PR: https://git.openjdk.org/jdk/pull/9737 From fgao at openjdk.org Tue Sep 6 02:47:38 2022 From: fgao at openjdk.org (Fei Gao) Date: Tue, 6 Sep 2022 02:47:38 GMT Subject: RFR: 8289422: Fix and re-enable vector conditional move [v4] In-Reply-To: <6uthI29shZjAeLK-eV3Kxqao06qoa9U9zQ5g_oDLmkI=.3e171aae-2003-46c9-88ac-9a63fecc5d96@github.com> References: <6uthI29shZjAeLK-eV3Kxqao06qoa9U9zQ5g_oDLmkI=.3e171aae-2003-46c9-88ac-9a63fecc5d96@github.com> Message-ID: > // float[] a, float[] b, float[] c; > for (int i = 0; i < a.length; i++) { > c[i] = (a[i] > b[i]) ? a[i] : b[i]; > } > > > After [JDK-8139340](https://bugs.openjdk.org/browse/JDK-8139340) and [JDK-8192846](https://bugs.openjdk.org/browse/JDK-8192846), we hope to vectorize the case > above by enabling -XX:+UseCMoveUnconditionally and -XX:+UseVectorCmov. > But the transformation here[1] is going to optimize the BoolNode > with constant input to a constant and break the design logic of > cmove vector node[2]. We can't prevent all GVN transformation to > the BoolNode before matcher, so the patch keeps the condition input > as a constant while creating a cmove vector node, and then > restructures it into a binary tree before matching. > > When the input order of original cmp node is different from the > input order of original cmove node, like: > > // float[] a, float[] b, float[] c; > for (int i = 0; i < a.length; i++) { > c[i] = (a[i] < b[i]) ? a[i] : b[i]; > } > > the patch negates the mask of the BoolNode before creating the > cmove vector node in SuperWord::output(). > > We can also use VectorNode::implemented() to consult if vector > conditional move is supported in the backend. So, the patch cleans > the related code in SuperWord::implemented(). > > With the patch, the performance uplift is: > (The micro-benchmark functions are included in the file > test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java) > > AArch64: > Benchmark (length) Mode Cnt uplift(ns/op) > cmoveD 523 avgt 15 68.89% > cmoveF 523 avgt 15 72.40% > > X86: > Benchmark (length) Mode Cnt uplift(ns/op) > cmoveD 523 avgt 15 73.12% > cmoveF 523 avgt 15 85.45% > > [1]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/subnode.cpp#L1310 > [2]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/matcher.cpp#L2365 Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Rebase the patch to the latest JDK and add some testcase for NE and EQ Change-Id: Ifb02b5efc2a09e6e0b4fc1c8346698597464f448 - Merge branch 'master' into fg8289422 Change-Id: I09677cb07f6b2717aa768a830663ca455806b900 - Merge branch 'master' into fg8289422 Change-Id: I870c7bbc73d12bac16756226125edc1a229ba412 - Enable the test only on aarch64 platform because X86 supports vector cmove only on some 256-bits AVXs Change-Id: I64dd49380fe3d303ef6be21460df3be31c1458f8 - Merge branch 'master' into fg8289422 Change-Id: I7936552df6ac12949ed8b550576f4e3520596423 - 8289422: Fix and re-enable vector conditional move ``` // float[] a, float[] b, float[] c; for (int i = 0; i < a.length; i++) { c[i] = (a[i] > b[i]) ? a[i] : b[i]; } ``` After JDK-8139340 and JDK-8192846, we hope to vectorize the case above by enabling -XX:+UseCMoveUnconditionally and -XX:+UseVectorCmov. But the transformation here[1] is going to optimize the BoolNode with constant input to a constant and break the design logic of cmove vector node[2]. We can't prevent all GVN transformation to the BoolNode before matcher, so the patch keeps the condition input as a constant while creating a cmove vector node, and then restructures it into a binary tree before matching. When the input order of original cmp node is different from the input order of original cmove node, like: ``` // float[] a, float[] b, float[] c; for (int i = 0; i < a.length; i++) { c[i] = (a[i] < b[i]) ? a[i] : b[i]; } ``` the patch negates the mask of the BoolNode before creating the cmove vector node in SuperWord::output(). We can also use VectorNode::implemented() to consult if vector conditional move is supported in the backend. So, the patch cleans the related code in SuperWord::implemented(). With the patch, the performance uplift is: (The micro-benchmark functions are included in the file test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java) AArch64: Benchmark (length) Mode Cnt uplift(ns/op) cmoveD 523 avgt 15 68.89% cmoveF 523 avgt 15 72.40% X86: Benchmark (length) Mode Cnt uplift(ns/op) cmoveD 523 avgt 15 73.12% cmoveF 523 avgt 15 85.45% [1]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/subnode.cpp#L1310 [2]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/matcher.cpp#L2365 Change-Id: If046dd745024deb0e602bf7efc2a07c22b89c690 ------------- Changes: https://git.openjdk.org/jdk/pull/9652/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9652&range=03 Stats: 318 lines in 7 files changed: 303 ins; 7 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/9652.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9652/head:pull/9652 PR: https://git.openjdk.org/jdk/pull/9652 From fgao at openjdk.org Tue Sep 6 03:20:32 2022 From: fgao at openjdk.org (Fei Gao) Date: Tue, 6 Sep 2022 03:20:32 GMT Subject: RFR: 8275275: AArch64: Fix performance regression after auto-vectorization on NEON Message-ID: For some vector opcodes, there are no corresponding AArch64 NEON instructions but supporting them benefits vector API. Some of this kind of opcodes are also used by superword for auto- vectorization and here is the list: VectorCastD2I, VectorCastL2F MulVL AddReductionVI/L/F/D MulReductionVI/L/F/D AndReductionV, OrReductionV, XorReductionV We did some micro-benchmark performance tests on NEON and found that some of listed opcodes hurt the performance of loops after auto-vectorization, but others don't. This patch disables those opcodes for superword, which have obvious performance regressions after auto-vectorization on NEON. Besides, one jtreg test case, where IR nodes are checked, is added in the patch to protect the code against change by mistake in the future. Here is the performance data before and after the patch on NEON. Benchmark length Mode Cnt Before After Units AddReductionVD 1024 thrpt 15 450.830 548.001 ops/ms AddReductionVF 1024 thrpt 15 514.468 548.013 ops/ms MulReductionVD 1024 thrpt 15 405.613 499.531 ops/ms MulReductionVF 1024 thrpt 15 451.292 495.061 ops/ms Note: Because superword doesn't vectorize reductions unconnected with other vector packs, the benchmark function for Add/Mul reduction is like: // private double[] da, db; // private double dresult; public void AddReductionVD() { double result = 1; for (int i = startIndex; i < length; i++) { result += (da[i] + db[i]); } dresult += result; } Specially, vector multiply long has been implemented but disabled for both vector API and superword. Out of the same reason, the patch re-enables MulVL on NEON for Vector API but still disables it for superword. The performance uplift on vector API is ~12.8x on my local. Benchmark length Mode Cnt Before After Units Long128Vector.MUL 1024 thrpt 10 55.015 760.593 ops/ms MulVL(superword) 1024 thrpt 10 907.788 907.805 ops/ms Note: The superword benchmark function is: // private long[] in1, in2, res; public void MulVL() { for (int i = 0; i < length; i++) { res[i] = in1[i] * in2[i]; } } The Vector API benchmark case is from: https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Long128Vector.java#L190 ------------- Commit messages: - 8275275: AArch64: Fix performance regression after auto-vectorization on NEON Changes: https://git.openjdk.org/jdk/pull/10175/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10175&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8275275 Stats: 137 lines in 4 files changed: 131 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10175.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10175/head:pull/10175 PR: https://git.openjdk.org/jdk/pull/10175 From aph at openjdk.org Tue Sep 6 09:42:43 2022 From: aph at openjdk.org (Andrew Haley) Date: Tue, 6 Sep 2022 09:42:43 GMT Subject: RFR: 8275275: AArch64: Fix performance regression after auto-vectorization on NEON In-Reply-To: References: Message-ID: <6ZzuCl-e7L8dd96rLZd3XFUeOcQ-6b7zWJo6_t8BP3Y=.b56896de-5e93-46d6-bde0-4efec3505f0f@github.com> On Tue, 6 Sep 2022 03:13:25 GMT, Fei Gao wrote: > For some vector opcodes, there are no corresponding AArch64 NEON > instructions but supporting them benefits vector API. Some of > this kind of opcodes are also used by superword for auto- > vectorization and here is the list: > > VectorCastD2I, VectorCastL2F > MulVL > AddReductionVI/L/F/D > MulReductionVI/L/F/D > AndReductionV, OrReductionV, XorReductionV > > > We did some micro-benchmark performance tests on NEON and found > that some of listed opcodes hurt the performance of loops after > auto-vectorization, but others don't. > > This patch disables those opcodes for superword, which have > obvious performance regressions after auto-vectorization on > NEON. Besides, one jtreg test case, where IR nodes are checked, > is added in the patch to protect the code against change by > mistake in the future. > > Here is the performance data before and after the patch on NEON. > > Benchmark length Mode Cnt Before After Units > AddReductionVD 1024 thrpt 15 450.830 548.001 ops/ms > AddReductionVF 1024 thrpt 15 514.468 548.013 ops/ms > MulReductionVD 1024 thrpt 15 405.613 499.531 ops/ms > MulReductionVF 1024 thrpt 15 451.292 495.061 ops/ms > > Note: > Because superword doesn't vectorize reductions unconnected with > other vector packs, the benchmark function for Add/Mul > reduction is like: > > // private double[] da, db; > // private double dresult; > public void AddReductionVD() { > double result = 1; > for (int i = startIndex; i < length; i++) { > result += (da[i] + db[i]); > } > dresult += result; > } > > > Specially, vector multiply long has been implemented but disabled > for both vector API and superword. Out of the same reason, the > patch re-enables MulVL on NEON for Vector API but still disables > it for superword. The performance uplift on vector API is ~12.8x > on my local. > > Benchmark length Mode Cnt Before After Units > Long128Vector.MUL 1024 thrpt 10 55.015 760.593 ops/ms > MulVL(superword) 1024 thrpt 10 907.788 907.805 ops/ms > > Note: > The superword benchmark function is: > > // private long[] in1, in2, res; > public void MulVL() { > for (int i = 0; i < length; i++) { > res[i] = in1[i] * in2[i]; > } > } > > The Vector API benchmark case is from: > https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Long128Vector.java#L190 That all makes very good sense. Thanks. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.org/jdk/pull/10175 From jbhateja at openjdk.org Tue Sep 6 10:02:50 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 Sep 2022 10:02:50 GMT Subject: RFR: 8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms [v3] In-Reply-To: References: Message-ID: > Hi All, > > This patch extends conversion optimizations added with [JDK-8287835](https://bugs.openjdk.org/browse/JDK-8287835) to optimize following floating point to integral conversions for X86 AVX2 targets:- > * D2I , D2S, D2B, F2I , F2S, F2B > > In addition, it also optimizes following wide vector (64 bytes) double to integer and sub-type conversions for AVX512 targets which do not support AVX512DQ feature. > * D2I, D2S, D2B > > Following are the JMH micro performance results with and without patch. > > System configuration: 40C 2S Icelake server (Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz) > > BENCHMARK | SIZE | BASELINE (ops/ms) | WITHOPT (ops/ms) | PERF GAIN FACTOR > -- | -- | -- | -- | -- > VectorFPtoIntCastOperations.microDouble128ToByte128 | 1024 | 90.603 | 92.797 | 1.024215534 > VectorFPtoIntCastOperations.microDouble128ToByte256 | 1024 | 81.909 | 82.3 | 1.00477359 > VectorFPtoIntCastOperations.microDouble128ToByte512 | 1024 | 26.181 | 26.244 | 1.002406325 > VectorFPtoIntCastOperations.microDouble128ToInteger128 | 1024 | 90.74 | 2537.958 | 27.96956138 > VectorFPtoIntCastOperations.microDouble128ToInteger256 | 1024 | 81.586 | 2429.599 | 29.7796068 > VectorFPtoIntCastOperations.microDouble128ToInteger512 | 1024 | 19.406 | 19.61 | 1.010512213 > VectorFPtoIntCastOperations.microDouble128ToLong128 | 1024 | 91.723 | 90.754 | 0.989435583 > VectorFPtoIntCastOperations.microDouble128ToShort128 | 1024 | 91.766 | 1984.577 | 21.62649565 > VectorFPtoIntCastOperations.microDouble128ToShort256 | 1024 | 81.949 | 1940.599 | 23.68056962 > VectorFPtoIntCastOperations.microDouble128ToShort512 | 1024 | 16.468 | 16.56 | 1.005586592 > VectorFPtoIntCastOperations.microDouble256ToByte128 | 1024 | 163.331 | 3018.351 | 18.479964 > VectorFPtoIntCastOperations.microDouble256ToByte256 | 1024 | 148.878 | 3082.034 | 20.70174237 > VectorFPtoIntCastOperations.microDouble256ToByte512 | 1024 | 50.108 | 51.629 | 1.030354434 > VectorFPtoIntCastOperations.microDouble256ToInteger128 | 1024 | 159.805 | 4619.421 | 28.90661118 > VectorFPtoIntCastOperations.microDouble256ToInteger256 | 1024 | 143.876 | 4649.642 | 32.31700909 > VectorFPtoIntCastOperations.microDouble256ToInteger512 | 1024 | 38.127 | 38.188 | 1.001599916 > VectorFPtoIntCastOperations.microDouble256ToLong128 | 1024 | 160.322 | 162.442 | 1.013223388 > VectorFPtoIntCastOperations.microDouble256ToLong256 | 1024 | 141.252 | 143.01 | 1.012445841 > VectorFPtoIntCastOperations.microDouble256ToShort128 | 1024 | 157.717 | 3757.471 | 23.82413437 > VectorFPtoIntCastOperations.microDouble256ToShort256 | 1024 | 143.876 | 3830.971 | 26.62689399 > VectorFPtoIntCastOperations.microDouble256ToShort512 | 1024 | 32.061 | 32.911 | 1.026511962 > VectorFPtoIntCastOperations.microFloat128ToByte128 | 1024 | 146.599 | 4002.967 | 27.30555461 > VectorFPtoIntCastOperations.microFloat128ToByte256 | 1024 | 136.99 | 3938.799 | 28.75245638 > VectorFPtoIntCastOperations.microFloat128ToByte512 | 1024 | 51.561 | 50.284 | 0.975233219 > VectorFPtoIntCastOperations.microFloat128ToInteger128 | 1024 | 5933.565 | 5361.472 | 0.903583596 > VectorFPtoIntCastOperations.microFloat128ToInteger256 | 1024 | 5079.564 | 5062.046 | 0.996551279 > VectorFPtoIntCastOperations.microFloat128ToInteger512 | 1024 | 37.101 | 38.419 | 1.035524649 > VectorFPtoIntCastOperations.microFloat128ToLong128 | 1024 | 145.863 | 145.362 | 0.99656527 > VectorFPtoIntCastOperations.microFloat128ToLong256 | 1024 | 131.159 | 133.154 | 1.015210546 > VectorFPtoIntCastOperations.microFloat128ToShort128 | 1024 | 145.966 | 4150.039 | 28.4315457 > VectorFPtoIntCastOperations.microFloat128ToShort256 | 1024 | 134.703 | 4566.589 | 33.90116775 > VectorFPtoIntCastOperations.microFloat128ToShort512 | 1024 | 31.878 | 30.867 | 0.968285338 > VectorFPtoIntCastOperations.microFloat256ToByte128 | 1024 | 237.841 | 6292.051 | 26.4548627 > VectorFPtoIntCastOperations.microFloat256ToByte256 | 1024 | 222.041 | 6292.748 | 28.34047766 > VectorFPtoIntCastOperations.microFloat256ToByte512 | 1024 | 92.073 | 88.981 | 0.966417951 > VectorFPtoIntCastOperations.microFloat256ToInteger128 | 1024 | 11471.121 | 10269.636 | 0.895260019 > VectorFPtoIntCastOperations.microFloat256ToInteger256 | 1024 | 10729.816 | 10105.92 | 0.941853989 > VectorFPtoIntCastOperations.microFloat256ToInteger512 | 1024 | 68.328 | 70.005 | 1.024543379 > VectorFPtoIntCastOperations.microFloat256ToLong128 | 1024 | 247.101 | 248.571 | 1.005948984 > VectorFPtoIntCastOperations.microFloat256ToLong256 | 1024 | 225.74 | 223.987 | 0.992234429 > VectorFPtoIntCastOperations.microFloat256ToLong512 | 1024 | 76.39 | 76.187 | 0.997342584 > VectorFPtoIntCastOperations.microFloat256ToShort128 | 1024 | 233.196 | 8202.179 | 35.17289748 > VectorFPtoIntCastOperations.microFloat256ToShort256 | 1024 | 220.75 | 7781.073 | 35.24834881 > VectorFPtoIntCastOperations.microFloat256ToShort512 | 1024 | 58.143 | 55.633 | 0.956830573 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8288043: Some mainline merge realted cleanups. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9748/files - new: https://git.openjdk.org/jdk/pull/9748/files/51de0e2b..5cdfd68f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9748&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9748&range=01-02 Stats: 168 lines in 3 files changed: 28 ins; 14 del; 126 mod Patch: https://git.openjdk.org/jdk/pull/9748.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9748/head:pull/9748 PR: https://git.openjdk.org/jdk/pull/9748 From duke at openjdk.org Tue Sep 6 14:30:40 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Tue, 6 Sep 2022 14:30:40 GMT Subject: RFR: 8292289: [vectorapi] Improve the implementation of VectorTestNode [v6] In-Reply-To: References: Message-ID: > This patch modifies the node generation of `VectorSupport::test` to emit a `CMoveINode`, which is picked up by `BoolNode::Ideal(PhaseGVN*, bool)` to connect the `VectorTestNode` directly to the `BoolNode`, removing the redundant operations of materialising the test result in a GP register and do a `CmpI` to get back the flags. As a result, `VectorMask::alltrue` is compiled into machine codes: > > vptest xmm0, xmm1 > jb if_true > if_false: > > instead of: > > vptest xmm0, xmm1 > setb r10 > movzbl r10 > testl r10 > jne if_true > if_false: > > The results of `jdk.incubator.vector.ArrayMismatchBenchmark` shows noticeable improvements: > > Before After > Benchmark Prefix Size Mode Cnt Score Error Score Error Units Change > ArrayMismatchBenchmark.mismatchVectorByte 0.5 9 thrpt 10 217345.383 ? 8316.444 222279.381 ? 2660.983 ops/ms +2.3% > ArrayMismatchBenchmark.mismatchVectorByte 0.5 257 thrpt 10 113918.406 ? 1618.836 116268.691 ? 1291.899 ops/ms +2.1% > ArrayMismatchBenchmark.mismatchVectorByte 0.5 100000 thrpt 10 702.066 ? 72.862 797.806 ? 16.429 ops/ms +13.6% > ArrayMismatchBenchmark.mismatchVectorByte 1.0 9 thrpt 10 146096.564 ? 2401.258 145338.910 ? 687.453 ops/ms -0.5% > ArrayMismatchBenchmark.mismatchVectorByte 1.0 257 thrpt 10 60598.181 ? 1259.397 69041.519 ? 1073.156 ops/ms +13.9% > ArrayMismatchBenchmark.mismatchVectorByte 1.0 100000 thrpt 10 316.814 ? 10.975 408.770 ? 5.281 ops/ms +29.0% > ArrayMismatchBenchmark.mismatchVectorDouble 0.5 9 thrpt 10 195674.549 ? 1200.166 188482.433 ? 1872.076 ops/ms -3.7% > ArrayMismatchBenchmark.mismatchVectorDouble 0.5 257 thrpt 10 44357.169 ? 473.013 42293.411 ? 2838.255 ops/ms -4.7% > ArrayMismatchBenchmark.mismatchVectorDouble 0.5 100000 thrpt 10 68.199 ? 5.410 67.628 ? 3.241 ops/ms -0.8% > ArrayMismatchBenchmark.mismatchVectorDouble 1.0 9 thrpt 10 107722.450 ? 1677.607 111060.400 ? 982.230 ops/ms +3.1% > ArrayMismatchBenchmark.mismatchVectorDouble 1.0 257 thrpt 10 16692.645 ? 1002.599 21440.506 ? 1618.266 ops/ms +28.4% > ArrayMismatchBenchmark.mismatchVectorDouble 1.0 100000 thrpt 10 32.984 ? 0.548 33.202 ? 2.365 ops/ms +0.7% > ArrayMismatchBenchmark.mismatchVectorInt 0.5 9 thrpt 10 335458.217 ? 3154.842 379944.254 ? 5703.134 ops/ms +13.3% > ArrayMismatchBenchmark.mismatchVectorInt 0.5 257 thrpt 10 58505.302 ? 786.312 56721.368 ? 2497.052 ops/ms -3.0% > ArrayMismatchBenchmark.mismatchVectorInt 0.5 100000 thrpt 10 133.037 ? 11.415 139.537 ? 4.667 ops/ms +4.9% > ArrayMismatchBenchmark.mismatchVectorInt 1.0 9 thrpt 10 117943.802 ? 2281.349 112409.365 ? 2110.055 ops/ms -4.7% > ArrayMismatchBenchmark.mismatchVectorInt 1.0 257 thrpt 10 27060.015 ? 795.619 33756.613 ? 826.533 ops/ms +24.7% > ArrayMismatchBenchmark.mismatchVectorInt 1.0 100000 thrpt 10 57.558 ? 8.927 66.951 ? 4.381 ops/ms +16.3% > ArrayMismatchBenchmark.mismatchVectorLong 0.5 9 thrpt 10 182963.715 ? 1042.497 182438.405 ? 2120.832 ops/ms -0.3% > ArrayMismatchBenchmark.mismatchVectorLong 0.5 257 thrpt 10 36672.215 ? 614.821 35397.398 ? 1609.235 ops/ms -3.5% > ArrayMismatchBenchmark.mismatchVectorLong 0.5 100000 thrpt 10 66.438 ? 2.142 65.427 ? 2.270 ops/ms -1.5% > ArrayMismatchBenchmark.mismatchVectorLong 1.0 9 thrpt 10 110393.047 ? 497.853 115165.845 ? 5381.674 ops/ms +4.3% > ArrayMismatchBenchmark.mismatchVectorLong 1.0 257 thrpt 10 14720.765 ? 661.350 19871.096 ? 201.464 ops/ms +35.0% > ArrayMismatchBenchmark.mismatchVectorLong 1.0 100000 thrpt 10 30.760 ? 0.821 31.933 ? 1.352 ops/ms +3.8% > > I have not been able to conduct throughout testing on AVX512 and Aarch64 so any help would be invaluable. Thank you very much. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Merge branch 'master' into improveVTest - refactor x86 - revert renaming temp - style + use rscratch instead of tmp - fix - redo aarch - Merge branch 'master' into improveVTest - delete aarch64 vector files - copyright - fix condition - ... and 12 more: https://git.openjdk.org/jdk/compare/6a1e98cb...c188a518 ------------- Changes: https://git.openjdk.org/jdk/pull/9855/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9855&range=05 Stats: 482 lines in 23 files changed: 214 ins; 144 del; 124 mod Patch: https://git.openjdk.org/jdk/pull/9855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9855/head:pull/9855 PR: https://git.openjdk.org/jdk/pull/9855 From duke at openjdk.org Tue Sep 6 17:31:46 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Tue, 6 Sep 2022 17:31:46 GMT Subject: RFR: 8292289: [vectorapi] Improve the implementation of VectorTestNode [v7] In-Reply-To: References: Message-ID: > This patch modifies the node generation of `VectorSupport::test` to emit a `CMoveINode`, which is picked up by `BoolNode::Ideal(PhaseGVN*, bool)` to connect the `VectorTestNode` directly to the `BoolNode`, removing the redundant operations of materialising the test result in a GP register and do a `CmpI` to get back the flags. As a result, `VectorMask::alltrue` is compiled into machine codes: > > vptest xmm0, xmm1 > jb if_true > if_false: > > instead of: > > vptest xmm0, xmm1 > setb r10 > movzbl r10 > testl r10 > jne if_true > if_false: > > The results of `jdk.incubator.vector.ArrayMismatchBenchmark` shows noticeable improvements: > > Before After > Benchmark Prefix Size Mode Cnt Score Error Score Error Units Change > ArrayMismatchBenchmark.mismatchVectorByte 0.5 9 thrpt 10 217345.383 ? 8316.444 222279.381 ? 2660.983 ops/ms +2.3% > ArrayMismatchBenchmark.mismatchVectorByte 0.5 257 thrpt 10 113918.406 ? 1618.836 116268.691 ? 1291.899 ops/ms +2.1% > ArrayMismatchBenchmark.mismatchVectorByte 0.5 100000 thrpt 10 702.066 ? 72.862 797.806 ? 16.429 ops/ms +13.6% > ArrayMismatchBenchmark.mismatchVectorByte 1.0 9 thrpt 10 146096.564 ? 2401.258 145338.910 ? 687.453 ops/ms -0.5% > ArrayMismatchBenchmark.mismatchVectorByte 1.0 257 thrpt 10 60598.181 ? 1259.397 69041.519 ? 1073.156 ops/ms +13.9% > ArrayMismatchBenchmark.mismatchVectorByte 1.0 100000 thrpt 10 316.814 ? 10.975 408.770 ? 5.281 ops/ms +29.0% > ArrayMismatchBenchmark.mismatchVectorDouble 0.5 9 thrpt 10 195674.549 ? 1200.166 188482.433 ? 1872.076 ops/ms -3.7% > ArrayMismatchBenchmark.mismatchVectorDouble 0.5 257 thrpt 10 44357.169 ? 473.013 42293.411 ? 2838.255 ops/ms -4.7% > ArrayMismatchBenchmark.mismatchVectorDouble 0.5 100000 thrpt 10 68.199 ? 5.410 67.628 ? 3.241 ops/ms -0.8% > ArrayMismatchBenchmark.mismatchVectorDouble 1.0 9 thrpt 10 107722.450 ? 1677.607 111060.400 ? 982.230 ops/ms +3.1% > ArrayMismatchBenchmark.mismatchVectorDouble 1.0 257 thrpt 10 16692.645 ? 1002.599 21440.506 ? 1618.266 ops/ms +28.4% > ArrayMismatchBenchmark.mismatchVectorDouble 1.0 100000 thrpt 10 32.984 ? 0.548 33.202 ? 2.365 ops/ms +0.7% > ArrayMismatchBenchmark.mismatchVectorInt 0.5 9 thrpt 10 335458.217 ? 3154.842 379944.254 ? 5703.134 ops/ms +13.3% > ArrayMismatchBenchmark.mismatchVectorInt 0.5 257 thrpt 10 58505.302 ? 786.312 56721.368 ? 2497.052 ops/ms -3.0% > ArrayMismatchBenchmark.mismatchVectorInt 0.5 100000 thrpt 10 133.037 ? 11.415 139.537 ? 4.667 ops/ms +4.9% > ArrayMismatchBenchmark.mismatchVectorInt 1.0 9 thrpt 10 117943.802 ? 2281.349 112409.365 ? 2110.055 ops/ms -4.7% > ArrayMismatchBenchmark.mismatchVectorInt 1.0 257 thrpt 10 27060.015 ? 795.619 33756.613 ? 826.533 ops/ms +24.7% > ArrayMismatchBenchmark.mismatchVectorInt 1.0 100000 thrpt 10 57.558 ? 8.927 66.951 ? 4.381 ops/ms +16.3% > ArrayMismatchBenchmark.mismatchVectorLong 0.5 9 thrpt 10 182963.715 ? 1042.497 182438.405 ? 2120.832 ops/ms -0.3% > ArrayMismatchBenchmark.mismatchVectorLong 0.5 257 thrpt 10 36672.215 ? 614.821 35397.398 ? 1609.235 ops/ms -3.5% > ArrayMismatchBenchmark.mismatchVectorLong 0.5 100000 thrpt 10 66.438 ? 2.142 65.427 ? 2.270 ops/ms -1.5% > ArrayMismatchBenchmark.mismatchVectorLong 1.0 9 thrpt 10 110393.047 ? 497.853 115165.845 ? 5381.674 ops/ms +4.3% > ArrayMismatchBenchmark.mismatchVectorLong 1.0 257 thrpt 10 14720.765 ? 661.350 19871.096 ? 201.464 ops/ms +35.0% > ArrayMismatchBenchmark.mismatchVectorLong 1.0 100000 thrpt 10 30.760 ? 0.821 31.933 ? 1.352 ops/ms +3.8% > > I have not been able to conduct throughout testing on AVX512 and Aarch64 so any help would be invaluable. Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix merge problems ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9855/files - new: https://git.openjdk.org/jdk/pull/9855/files/c188a518..e5a81c41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9855&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9855&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9855/head:pull/9855 PR: https://git.openjdk.org/jdk/pull/9855 From haosun at openjdk.org Tue Sep 6 23:22:41 2022 From: haosun at openjdk.org (Hao Sun) Date: Tue, 6 Sep 2022 23:22:41 GMT Subject: RFR: 8292587: AArch64: Support SVE fabd instruction In-Reply-To: References: Message-ID: On Thu, 25 Aug 2022 01:52:41 GMT, Hao Sun wrote: > Scalar and NEON fabd instructions were initially supported in > JDK-8256318. In this patch, we support SVE fabd instruction [1] and add > one Jtreg test case as well. > > With this patch, two instructions `fsub + fabs` would be combined into > one single `fabd` instruction. > > > fsub z16.s, z16.s, z17.s > fabs z16.s, p7/m, z16.s > > --> > > fabd z16.s, p7/m, z16.s, z17.s > > > In the initial evaluation of JMH case, i.e. > FloatingScalarVectorAbsDiff.java, we found the performance uplift done > by this optimization was easily hidden by the heavy memory load/store > instructions. To avoid that, we updated the JMH case a bit, adding one > more group of subtraction and Math.abs operations in the loop body. > > Here shows the data with the new JMH case on one 256-bit SVE machine. We > can observe about 39% and 35% improvements for the two functions > respectively. > > > Benchmark Before After Units > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 260.468 160.965 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 133.963 87.292 ns/op > > > Jtreg testing: tier1~3 passed on one NEON-only machine and one 256-bit SVE machine. > > [1] https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FABD--Floating-point-absolute-difference--predicated-- Hi @nick-arm, could you help to review this patch when you have spare time? Thanks in advance. ------------- PR: https://git.openjdk.org/jdk/pull/10011 From xgong at openjdk.org Wed Sep 7 02:26:45 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 7 Sep 2022 02:26:45 GMT Subject: RFR: 8275275: AArch64: Fix performance regression after auto-vectorization on NEON In-Reply-To: References: Message-ID: On Tue, 6 Sep 2022 03:13:25 GMT, Fei Gao wrote: > For some vector opcodes, there are no corresponding AArch64 NEON > instructions but supporting them benefits vector API. Some of > this kind of opcodes are also used by superword for auto- > vectorization and here is the list: > > VectorCastD2I, VectorCastL2F > MulVL > AddReductionVI/L/F/D > MulReductionVI/L/F/D > AndReductionV, OrReductionV, XorReductionV > > > We did some micro-benchmark performance tests on NEON and found > that some of listed opcodes hurt the performance of loops after > auto-vectorization, but others don't. > > This patch disables those opcodes for superword, which have > obvious performance regressions after auto-vectorization on > NEON. Besides, one jtreg test case, where IR nodes are checked, > is added in the patch to protect the code against change by > mistake in the future. > > Here is the performance data before and after the patch on NEON. > > Benchmark length Mode Cnt Before After Units > AddReductionVD 1024 thrpt 15 450.830 548.001 ops/ms > AddReductionVF 1024 thrpt 15 514.468 548.013 ops/ms > MulReductionVD 1024 thrpt 15 405.613 499.531 ops/ms > MulReductionVF 1024 thrpt 15 451.292 495.061 ops/ms > > Note: > Because superword doesn't vectorize reductions unconnected with > other vector packs, the benchmark function for Add/Mul > reduction is like: > > // private double[] da, db; > // private double dresult; > public void AddReductionVD() { > double result = 1; > for (int i = startIndex; i < length; i++) { > result += (da[i] + db[i]); > } > dresult += result; > } > > > Specially, vector multiply long has been implemented but disabled > for both vector API and superword. Out of the same reason, the > patch re-enables MulVL on NEON for Vector API but still disables > it for superword. The performance uplift on vector API is ~12.8x > on my local. > > Benchmark length Mode Cnt Before After Units > Long128Vector.MUL 1024 thrpt 10 55.015 760.593 ops/ms > MulVL(superword) 1024 thrpt 10 907.788 907.805 ops/ms > > Note: > The superword benchmark function is: > > // private long[] in1, in2, res; > public void MulVL() { > for (int i = 0; i < length; i++) { > res[i] = in1[i] * in2[i]; > } > } > > The Vector API benchmark case is from: > https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Long128Vector.java#L190 src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 146: > 144: // Fail fast, otherwise fall through to common vector_size_supported() check. > 145: switch (opcode) { > 146: case Op_MulVL: Enabling `MulVL` for vector api is great. Thanks for doing this! However, this might break several match rules like https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2025 and the `vmls`. The assertion in line-2035 might fail if this rule is matched for a long vector and runs on hardwares that do not support sve. One way to fix is adding the predicate to these rules to skip the long vector type for neon. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10175 From fgao at openjdk.org Wed Sep 7 02:36:40 2022 From: fgao at openjdk.org (Fei Gao) Date: Wed, 7 Sep 2022 02:36:40 GMT Subject: RFR: 8275275: AArch64: Fix performance regression after auto-vectorization on NEON In-Reply-To: References: Message-ID: On Wed, 7 Sep 2022 02:23:03 GMT, Xiaohong Gong wrote: >> For some vector opcodes, there are no corresponding AArch64 NEON >> instructions but supporting them benefits vector API. Some of >> this kind of opcodes are also used by superword for auto- >> vectorization and here is the list: >> >> VectorCastD2I, VectorCastL2F >> MulVL >> AddReductionVI/L/F/D >> MulReductionVI/L/F/D >> AndReductionV, OrReductionV, XorReductionV >> >> >> We did some micro-benchmark performance tests on NEON and found >> that some of listed opcodes hurt the performance of loops after >> auto-vectorization, but others don't. >> >> This patch disables those opcodes for superword, which have >> obvious performance regressions after auto-vectorization on >> NEON. Besides, one jtreg test case, where IR nodes are checked, >> is added in the patch to protect the code against change by >> mistake in the future. >> >> Here is the performance data before and after the patch on NEON. >> >> Benchmark length Mode Cnt Before After Units >> AddReductionVD 1024 thrpt 15 450.830 548.001 ops/ms >> AddReductionVF 1024 thrpt 15 514.468 548.013 ops/ms >> MulReductionVD 1024 thrpt 15 405.613 499.531 ops/ms >> MulReductionVF 1024 thrpt 15 451.292 495.061 ops/ms >> >> Note: >> Because superword doesn't vectorize reductions unconnected with >> other vector packs, the benchmark function for Add/Mul >> reduction is like: >> >> // private double[] da, db; >> // private double dresult; >> public void AddReductionVD() { >> double result = 1; >> for (int i = startIndex; i < length; i++) { >> result += (da[i] + db[i]); >> } >> dresult += result; >> } >> >> >> Specially, vector multiply long has been implemented but disabled >> for both vector API and superword. Out of the same reason, the >> patch re-enables MulVL on NEON for Vector API but still disables >> it for superword. The performance uplift on vector API is ~12.8x >> on my local. >> >> Benchmark length Mode Cnt Before After Units >> Long128Vector.MUL 1024 thrpt 10 55.015 760.593 ops/ms >> MulVL(superword) 1024 thrpt 10 907.788 907.805 ops/ms >> >> Note: >> The superword benchmark function is: >> >> // private long[] in1, in2, res; >> public void MulVL() { >> for (int i = 0; i < length; i++) { >> res[i] = in1[i] * in2[i]; >> } >> } >> >> The Vector API benchmark case is from: >> https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Long128Vector.java#L190 > > src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 146: > >> 144: // Fail fast, otherwise fall through to common vector_size_supported() check. >> 145: switch (opcode) { >> 146: case Op_MulVL: > > Enabling `MulVL` for vector api is great. Thanks for doing this! However, this might break several match rules like https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2025 and the `vmls`. The assertion in line-2035 might fail if this rule is matched for a long vector and runs on hardwares that do not support sve. One way to fix is adding the predicate to these rules to skip the long vector type for neon. Thanks! Thanks for your kind reminder. I'll fix these related rules and add corresponding vector api regression tests in this PR. ------------- PR: https://git.openjdk.org/jdk/pull/10175 From xgong at openjdk.org Wed Sep 7 06:01:22 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 7 Sep 2022 06:01:22 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> Message-ID: <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> > Recently we found the performance of "`FIRST_NONZERO`" for double type is largely worse than the other types on x86 when `UseAVX=2`. The main reason is the "`VectorCastL2X`" op is not supported by the backend when the dst element type is `T_DOUBLE`. This makes the check of `VectorCast` op fail before intrinsifying "`VectorMask.cast()`" which is used in the > "`FIRST_NONZERO`" java implementation (see [1]). However, the compiler will not generate the `VectorCast `op for `VectorMask.cast()` if: > > 1) the current platform supports the predicated feature > 2) the element size (in bytes) of the src and dst type is the same > > So the check of "`VectorCast`" op is needless for such cases. To fix it, this patch: > > 1) limits the specified vector cast op check to vectors > 2) adds the relative mask cast op check for VectorMask.cast() > 3) cleans up the unnecessary codes > > Here is the performance of "`FIRST_NONZERO`" benchmark [2] on a x86 machine with `UseAVX=2`: > > Benchmark (size) Mode Cnt Before After Units > DoubleMaxVector.FIRST_NONZERO 1024 thrpt 15 49.266 2460.886 ops/ms > DoubleMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 49.554 1892.223 ops/ms > > [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java#L770 > [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L246 Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge branch 'jdk:master' into JDK-8291600 - Address review comments - Add vector cast op check for vector mask for some cases - Revert the unify changes to vector mask cast - Merge branch 'jdk:master' into JDK-8291600 - Fix x86 codegen issue - Unify VectorMaskCast for all platforms - Merge branch 'master' into JDK-8291600 - 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast ------------- Changes: https://git.openjdk.org/jdk/pull/9737/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9737&range=06 Stats: 20 lines in 1 file changed: 8 ins; 3 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/9737.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9737/head:pull/9737 PR: https://git.openjdk.org/jdk/pull/9737 From xgong at openjdk.org Wed Sep 7 06:12:19 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 7 Sep 2022 06:12:19 GMT Subject: RFR: 8292898: [vectorapi] Unify vector mask cast operation Message-ID: The current implementation of the vector mask cast operation is complex that the compiler generates different patterns for different scenarios. For architectures that do not support the predicate feature, vector mask is represented the same as the normal vector. So the vector mask cast is implemented by `VectorCast `node. But this is not always needed. When two masks have the same element size (e.g. int vs. float), their bits layout are the same. So casting between them does not need to emit any instructions. Currently the compiler generates different patterns based on the vector type of the input/output and the platforms. Normally the "`VectorMaskCast`" op is only used for cases that doesn't emit any instructions, and "`VectorCast`" op is used to implement the necessary expand/narrow operations. This can avoid adding some duplicate rules in the backend. However, this also has the drawbacks: 1) The codes are complex, especially when the compiler needs to check whether the hardware supports the necessary IRs for the vector mask cast. It needs to check different patterns for different cases. 2) The vector mask cast operation could be implemented with cheaper instructions than the vector casting on some architectures. Instead of generating `VectorCast `or `VectorMaskCast `nodes for different cases of vector mask cast operations, this patch unifies the vector mask cast implementation with "`VectorMaskCast`" node for all vector types and platforms. The missing backend rules are also added for it. This patch also simplies the vector mask conversion happened in "`VectorUnbox::Ideal()`". Normally "`VectorUnbox (VectorBox vmask)`" can be optimized to "`vmask`" if the unboxing type matches with the boxed "`vmask`" type. Otherwise, it needs the type conversion. Currently the "`VectorUnbox`" will be transformed to two different patterns to implement the conversion: 1) If the element size is not changed, it is transformed to: "VectorMaskCast vmask" 2) Otherwise, it is transformed to: "VectorLoadMask (VectorStoreMask vmask)" It firstly converts the "`vmask`" to a boolean vector with "`VectorStoreMask`", and then uses "`VectorLoadMask`" to convert the boolean vector to the dst mask vector. Since this patch makes "`VectorMaskCast`" op supported for all types on all platforms, it doesn't need the "`VectorLoadMask`" and "`VectorStoreMask`" to do the conversion. The existing transformation: VectorUnbox (VectorBox vmask) => VectorLoadMask (VectorStoreMask vmask) can be simplified to: VectorUnbox (VectorBox vmask) => VectorMaskCast vmask ------------- Depends on: https://git.openjdk.org/jdk/pull/9737 Commit messages: - 8292898: [vectorapi] Unify vector mask cast operation - 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast Changes: https://git.openjdk.org/jdk/pull/10192/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10192&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8292898 Stats: 364 lines in 8 files changed: 279 ins; 58 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/10192.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10192/head:pull/10192 PR: https://git.openjdk.org/jdk/pull/10192 From fgao at openjdk.org Wed Sep 7 07:46:39 2022 From: fgao at openjdk.org (Fei Gao) Date: Wed, 7 Sep 2022 07:46:39 GMT Subject: RFR: 8292587: AArch64: Support SVE fabd instruction In-Reply-To: References: Message-ID: On Thu, 25 Aug 2022 01:52:41 GMT, Hao Sun wrote: > Scalar and NEON fabd instructions were initially supported in > JDK-8256318. In this patch, we support SVE fabd instruction [1] and add > one Jtreg test case as well. > > With this patch, two instructions `fsub + fabs` would be combined into > one single `fabd` instruction. > > > fsub z16.s, z16.s, z17.s > fabs z16.s, p7/m, z16.s > > --> > > fabd z16.s, p7/m, z16.s, z17.s > > > In the initial evaluation of JMH case, i.e. > FloatingScalarVectorAbsDiff.java, we found the performance uplift done > by this optimization was easily hidden by the heavy memory load/store > instructions. To avoid that, we updated the JMH case a bit, adding one > more group of subtraction and Math.abs operations in the loop body. > > Here shows the data with the new JMH case on one 256-bit SVE machine. We > can observe about 39% and 35% improvements for the two functions > respectively. > > > Benchmark Before After Units > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 260.468 160.965 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 133.963 87.292 ns/op > > > Jtreg testing: tier1~3 passed on one NEON-only machine and one 256-bit SVE machine. > > [1] https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FABD--Floating-point-absolute-difference--predicated-- test/hotspot/jtreg/compiler/vectorapi/VectorAbsDiffTest.java line 97: > 95: public static void testFloatAbsDiff_runner() { > 96: testFloatAbsDiff(); > 97: for (int i = 0; i < F_SPECIES.length(); i++) { I suppose it should be `for (int i = 0; i < LENGTH; i++) {` here. You can check all similar code lines in the following functions for verification. ------------- PR: https://git.openjdk.org/jdk/pull/10011 From xgong at openjdk.org Wed Sep 7 09:22:11 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 7 Sep 2022 09:22:11 GMT Subject: RFR: 8292898: [vectorapi] Unify vector mask cast operation [v2] In-Reply-To: References: Message-ID: > The current implementation of the vector mask cast operation is > complex that the compiler generates different patterns for different > scenarios. For architectures that do not support the predicate > feature, vector mask is represented the same as the normal vector. > So the vector mask cast is implemented by `VectorCast `node. But this > is not always needed. When two masks have the same element size (e.g. > int vs. float), their bits layout are the same. So casting between > them does not need to emit any instructions. > > Currently the compiler generates different patterns based on the > vector type of the input/output and the platforms. Normally the > "`VectorMaskCast`" op is only used for cases that doesn't emit any > instructions, and "`VectorCast`" op is used to implement the necessary > expand/narrow operations. This can avoid adding some duplicate rules > in the backend. However, this also has the drawbacks: > > 1) The codes are complex, especially when the compiler needs to > check whether the hardware supports the necessary IRs for the > vector mask cast. It needs to check different patterns for > different cases. > 2) The vector mask cast operation could be implemented with cheaper > instructions than the vector casting on some architectures. > > Instead of generating `VectorCast `or `VectorMaskCast `nodes for different > cases of vector mask cast operations, this patch unifies the vector > mask cast implementation with "`VectorMaskCast`" node for all vector types > and platforms. The missing backend rules are also added for it. > > This patch also simplies the vector mask conversion happened in > "`VectorUnbox::Ideal()`". Normally "`VectorUnbox (VectorBox vmask)`" can > be optimized to "`vmask`" if the unboxing type matches with the boxed > "`vmask`" type. Otherwise, it needs the type conversion. Currently the > "`VectorUnbox`" will be transformed to two different patterns to implement > the conversion: > > 1) If the element size is not changed, it is transformed to: > > "VectorMaskCast vmask" > > 2) Otherwise, it is transformed to: > > "VectorLoadMask (VectorStoreMask vmask)" > > It firstly converts the "`vmask`" to a boolean vector with "`VectorStoreMask`", > and then uses "`VectorLoadMask`" to convert the boolean vector to the > dst mask vector. Since this patch makes "`VectorMaskCast`" op supported > for all types on all platforms, it doesn't need the "`VectorLoadMask`" and > "`VectorStoreMask`" to do the conversion. The existing transformation: > > VectorUnbox (VectorBox vmask) => VectorLoadMask (VectorStoreMask vmask) > > can be simplified to: > > VectorUnbox (VectorBox vmask) => VectorMaskCast vmask Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: 8292898: [vectorapi] Unify vector mask cast operation ------------- Changes: https://git.openjdk.org/jdk/pull/10192/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10192&range=01 Stats: 360 lines in 8 files changed: 278 ins; 62 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/10192.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10192/head:pull/10192 PR: https://git.openjdk.org/jdk/pull/10192 From tholenstein at openjdk.org Wed Sep 7 09:25:22 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 7 Sep 2022 09:25:22 GMT Subject: RFR: JDK-8293477: IGV: Upgrade to Netbeans Platform 15 Message-ID: Upgrade IGV and dependencies to the newest Netbeans Platform 15 which was released on September 2022 (officially support running on JDK 11 and JDK 17). ## Testing Tested the following use cases manually on macOS and JDK 17: - build with maven 3.8.1 - import graphs via network (localhost) - Save all groups to XML - Save selected groups to XML - Remove selected graphs - Remove selected groups - Remove all groups - Open XML graph file - Expand groups in Outline - Open a graphs in from same and different group in Outline - "Open clone" in the Outline - "Open Difference to current graph" for graphs in same and different group in Outline - Opening a new graph : Updates the Bytecode and Control Flow window - Show next / previous graph in current group buttons - Expand / Reduce the difference selection buttons - Changing of the difference selection by modifying the slider - Extract set of selected nodes and check if they are centered - Hiding of selected nodes - Showing all nodes again - Zooming in / out - Different views: Sea of nodes / clustered seas of nodes / CFG - Satellite view: button and by pressing the S key - Enable / Disable "Show neighbouring nodes of fully visible nodes semi-transparent" - Undo / Redo - Selection mode: button and by holding Ctrl + mouse-drag - Searching a node: Selects the node and centres it. Makes the node visible if it is hidden - Searching a block: Selects all nodes in the block and centres it. Makes the all the nodes in the block visible - Selecting node(s): adjusts colours in slider. Show property in Properties window - Hovering a node: highlights node and shows property box - Hovering a connection: highlights connection and corresponding nodes - apply filters - select nodes corresponding to a bytecode - select nodes corresponding to a basic block in the control flow ------------- Commit messages: - Upgrade to Netbeans Platform 15 Changes: https://git.openjdk.org/jdk/pull/10195/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10195&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293477 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/10195.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10195/head:pull/10195 PR: https://git.openjdk.org/jdk/pull/10195 From tholenstein at openjdk.org Wed Sep 7 10:58:22 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 7 Sep 2022 10:58:22 GMT Subject: RFR: JDK-8293480: IGV: Update Bytecode and ControlFlow Component immediately when opening a new graph Message-ID: The `BytecodeViewTopComponent` and `ControlFlowTopComponent` represent information depending on what graph is open in `EditorTopComponent`. Previously, `BytecodeViewTopComponent` and `ControlFlowTopComponent` did not update its content immediately when a new graph from a different group is opened in `EditorTopComponent`. They also did not update when switching between two tabs of open graph. We missed to `fire()` a `diagramChangedEvent` in the constructor of `EditorTopComponent`. We also need to fire when `BytecodeViewTopComponent` and `ControlFlowTopComponent` are initially opened. Update ------------- Commit messages: - Updated Bytecode and ControlFlow Component immediately when opening a new graph Changes: https://git.openjdk.org/jdk/pull/10196/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10196&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293480 Stats: 53 lines in 4 files changed: 13 ins; 8 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/10196.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10196/head:pull/10196 PR: https://git.openjdk.org/jdk/pull/10196 From tholenstein at openjdk.org Wed Sep 7 11:55:25 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 7 Sep 2022 11:55:25 GMT Subject: RFR: JDK-8290011: IGV: Remove dead code and cleanup Message-ID: Remove dead code from the IGV code base. There are many unused or redundant functions in the code ------------- Commit messages: - sort all imports - removed unused imports - remove various unused variables and methods - remove unused getRightWidget() and getLeftWidget() - remove comment - added missing asynchronous() - Fix Netbeans position warning - remove broken HideDuplicatesAction - remove InputGraph from Diagram - refactor diagramChanged() - ... and 21 more: https://git.openjdk.org/jdk/compare/512fee1d...e5201fd4 Changes: https://git.openjdk.org/jdk/pull/10197/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10197&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290011 Stats: 3480 lines in 97 files changed: 199 ins; 3000 del; 281 mod Patch: https://git.openjdk.org/jdk/pull/10197.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10197/head:pull/10197 PR: https://git.openjdk.org/jdk/pull/10197 From chagedorn at openjdk.org Wed Sep 7 13:17:47 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 7 Sep 2022 13:17:47 GMT Subject: RFR: JDK-8293477: IGV: Upgrade to Netbeans Platform 15 In-Reply-To: References: Message-ID: On Wed, 7 Sep 2022 09:17:26 GMT, Tobias Holenstein wrote: > Upgrade IGV and dependencies to the newest Netbeans Platform 15 which was released on September 2022 (officially support running on JDK 11 and JDK 17). > > ## Testing > > Tested the following use cases manually on macOS and JDK 17: > > - build with maven 3.8.1 > - import graphs via network (localhost) > - Save all groups to XML > - Save selected groups to XML > - Remove selected graphs > - Remove selected groups > - Remove all groups > - Open XML graph file > - Expand groups in Outline > - Open a graphs in from same and different group in Outline > - "Open clone" in the Outline > - "Open Difference to current graph" for graphs in same and different group in Outline > - Opening a new graph : Updates the Bytecode and Control Flow window > - Show next / previous graph in current group buttons > - Expand / Reduce the difference selection buttons > - Changing of the difference selection by modifying the slider > - Extract set of selected nodes and check if they are centered > - Hiding of selected nodes > - Showing all nodes again > - Zooming in / out > - Different views: Sea of nodes / clustered seas of nodes / CFG > - Satellite view: button and by pressing the S key > - Enable / Disable "Show neighbouring nodes of fully visible nodes semi-transparent" > - Undo / Redo > - Selection mode: button and by holding Ctrl + mouse-drag > - Searching a node: Selects the node and centres it. Makes the node visible if it is hidden > - Searching a block: Selects all nodes in the block and centres it. Makes the all the nodes in the block visible > - Selecting node(s): adjusts colours in slider. Show property in Properties window > - Hovering a node: highlights node and shows property box > - Hovering a connection: highlights connection and corresponding nodes > - apply filters > - select nodes corresponding to a bytecode > - select nodes corresponding to a basic block in the control flow Looks good! Nice extensive manual testing. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10195 From duke at openjdk.org Wed Sep 7 13:45:44 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Wed, 7 Sep 2022 13:45:44 GMT Subject: RFR: 8285487: AArch64: Do not generate unneeded trampolines for runtime calls [v6] In-Reply-To: References: <7-9sMaveCa-a2sxHULFPmCYf1-uudLSkm3WLzwF_UuY=.eab16718-87d4-4a2e-803a-2f60c746c0c5@github.com> Message-ID: On Tue, 16 Aug 2022 12:04:20 GMT, Andrew Haley wrote: >> This is the fix of: >> [JDK-8286314](https://bugs.openjdk.org/browse/JDK-8286314): Trampoline not created for far runtime targets outside small CodeCache >> >> At the time of writing the fix I could not find how to test the fix. We don't generate such trampoline calls. >> >> I think if such calls appear, they will always be unreachable. > > Please take it out, then. Hi Andrew (@theRealAph), Have you had a change to review the latest version? Thanks, Evgeny ------------- PR: https://git.openjdk.org/jdk/pull/8403 From aph at openjdk.org Wed Sep 7 14:13:44 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 7 Sep 2022 14:13:44 GMT Subject: RFR: 8285487: AArch64: Do not generate unneeded trampolines for runtime calls [v8] In-Reply-To: References: Message-ID: On Tue, 16 Aug 2022 13:09:26 GMT, Evgeny Astigeevich wrote: >> Runtime calls are calls of non-compiled methods. Non-compiled methods stay forever in CodeCache. If they are always within the branch range, they don't need trampolines. >> >> This PR adds `is_always_within_branch_range(Address entry)`. >> >> Results from DaCapo: the total number of eliminated trampolines per a benchmark run >> >> >> +----------+--------+ >> | avrora | 15491 | >> | batik | 75837 | >> | biojava | 13927 | >> | eclipse | 414143 | >> | fop | 119267 | >> | graphchi | 7665 | >> | jme | 8279 | >> | luindex | 56061 | >> | lusearch | 50277 | >> | pmd | 132719 | >> | sunflow | 10689 | >> | tomcat | 186967 | >> | xalan | 50349 | >> | zxing | 41497 | >> +----------+--------+ >> >> >> >> Testing: >> - `tier1`...`tier2`: Passed >> - `compiler/c2/aarch64/TestTrampoline.java`: Passed >> >> Note: `compiler/c2/aarch64/TestTrampoline.java` requires the release build. This is because debug builds have the branch range set to 2M which causes always generation of trampolines. > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Simplify checks in is_always_within_branch_range Looks good. Sorry for the delay, I thought I'd already approved this patch. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.org/jdk/pull/8403 From duke at openjdk.org Wed Sep 7 14:42:57 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Wed, 7 Sep 2022 14:42:57 GMT Subject: Integrated: 8285487: AArch64: Do not generate unneeded trampolines for runtime calls In-Reply-To: References: Message-ID: On Tue, 26 Apr 2022 16:14:44 GMT, Evgeny Astigeevich wrote: > Runtime calls are calls of non-compiled methods. Non-compiled methods stay forever in CodeCache. If they are always within the branch range, they don't need trampolines. > > This PR adds `is_always_within_branch_range(Address entry)`. > > Results from DaCapo: the total number of eliminated trampolines per a benchmark run > > > +----------+--------+ > | avrora | 15491 | > | batik | 75837 | > | biojava | 13927 | > | eclipse | 414143 | > | fop | 119267 | > | graphchi | 7665 | > | jme | 8279 | > | luindex | 56061 | > | lusearch | 50277 | > | pmd | 132719 | > | sunflow | 10689 | > | tomcat | 186967 | > | xalan | 50349 | > | zxing | 41497 | > +----------+--------+ > > > > Testing: > - `tier1`...`tier2`: Passed > - `compiler/c2/aarch64/TestTrampoline.java`: Passed > > Note: `compiler/c2/aarch64/TestTrampoline.java` requires the release build. This is because debug builds have the branch range set to 2M which causes always generation of trampolines. This pull request has now been integrated. Changeset: 6ff4775b Author: Evgeny Astigeevich Committer: Andrew Haley URL: https://git.openjdk.org/jdk/commit/6ff4775b717d91f9acf24d014ae155dfacac06c5 Stats: 151 lines in 2 files changed: 135 ins; 15 del; 1 mod 8285487: AArch64: Do not generate unneeded trampolines for runtime calls Reviewed-by: xliu, aph ------------- PR: https://git.openjdk.org/jdk/pull/8403 From duke at openjdk.org Wed Sep 7 14:51:59 2022 From: duke at openjdk.org (Evgeny Astigeevich) Date: Wed, 7 Sep 2022 14:51:59 GMT Subject: RFR: 8285487: AArch64: Do not generate unneeded trampolines for runtime calls [v8] In-Reply-To: References: Message-ID: On Wed, 7 Sep 2022 14:11:30 GMT, Andrew Haley wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplify checks in is_always_within_branch_range > > Looks good. Sorry for the delay, I thought I'd already approved this patch. @theRealAph Thank you! ------------- PR: https://git.openjdk.org/jdk/pull/8403 From jiefu at openjdk.org Wed Sep 7 15:13:37 2022 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 7 Sep 2022 15:13:37 GMT Subject: RFR: 8293491: Avoid unexpected deoptimization in loop exit due to incorrect branch profiling Message-ID: Hi all, Please review this patch which fixes the unexpected deoptimizations in loop exit due to incorrect branch profiling. # Background While analyzing our big data Apps, we observed unexpected deoptimizations in loop exit due to incorrect branch profiling. Here is a reproducer. public class UnexpectedLoopExitDeopt { public static final int N = 20000000; public static int d1[] = new int[N]; public static int d2[] = new int[N]; public static void main(String[] args) { System.out.println(test(d1)); System.out.println(test(d2)); } public static int test(int[] a) { int sum = 0; for(int i = 0; i < a.length; i++) { sum += a[i]; } return sum; } } The following is the compilation sequence. 77 1 3 java.lang.Object:: (1 bytes) 83 2 3 java.lang.String::isLatin1 (19 bytes) 84 6 3 jdk.internal.util.Preconditions::checkIndex (18 bytes) 84 3 3 java.lang.String::charAt (25 bytes) 85 4 3 java.lang.StringLatin1::charAt (15 bytes) 86 7 3 java.lang.String::coder (15 bytes) 86 8 3 java.lang.String::hashCode (60 bytes) 87 5 3 java.lang.String::checkIndex (10 bytes) 87 9 3 java.lang.String::length (11 bytes) 93 10 n 0 java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L (native) (static) 96 11 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLL)L (native) (static) 96 12 n 0 java.lang.Object::hashCode (native) 97 13 n 0 java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L (native) 98 14 3 java.util.Objects::requireNonNull (14 bytes) 98 15 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L (native) (static) 98 16 1 java.lang.Enum::ordinal (5 bytes) 101 17 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLL)V (native) (static) 102 18 n 0 java.lang.invoke.MethodHandle::invokeBasic(LL)L (native) 212 19 % 3 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) 213 20 % 4 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) 221 19 % 3 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) made not entrant 221 21 4 UnexpectedLoopExitDeopt::test (24 bytes) 230 20 % 4 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) made not entrant <--- Unexpected deopt 0 242 21 4 UnexpectedLoopExitDeopt::test (24 bytes) made not entrant <--- Unexpected deopt 0 The last two deopts (made not entrant) happened in the loop exit which are unexpected. # Reason The unexpected deopts were caused by the incorrect branch profiling count (0 taken count for loop predicate). Here is the profiling data for `UnexpectedLoopExitDeopt::test`. We can see that for `if_icmpge` @ bci=7, the count for `not taken` is 264957, while 0 for `taken`. The profile count for zero taken is obvious incorrect since the loop will finally exit (when `i >= a.length`). So the taken count should be at least 1 for `if_icmpge` @ bci=7. 0 iconst_0 1 istore_1 2 iconst_0 3 istore_2 4 iload_2 5 fast_aload_0 6 arraylength 7 if_icmpge 22 0 bci: 7 BranchData taken(0) displacement(56) not taken(264957) 10 iload_1 11 fast_aload_0 12 iload_2 13 iaload 14 iadd 15 istore_1 16 iinc #2 1 19 goto 4 32 bci: 19 JumpData taken(266667) displacement(-32) 22 iload_1 23 ireturn # Fix The main idea is to detect if the branch taken target is a loop exit. If so, set the taken count to be at least 1. This is fine because most loops should be finite and would execute the loop exit code at lease once. For infinite loops like `while (true) {...}`, the patch won't change the original behaviour since there is no loop exit. # Testing tier1~3 on Linux/x64, no regression Thanks. Best regards, Jie ------------- Commit messages: - 8293491: Avoid unexpected deoptimization in loop exit due to incorrect branch profiling Changes: https://git.openjdk.org/jdk/pull/10200/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10200&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293491 Stats: 92 lines in 5 files changed: 86 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10200.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10200/head:pull/10200 PR: https://git.openjdk.org/jdk/pull/10200 From jiefu at openjdk.org Wed Sep 7 16:05:32 2022 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 7 Sep 2022 16:05:32 GMT Subject: RFR: 8293497: Build failure due to MaxVectorSize was not declared when C2 is disabled after JDK-8293254 Message-ID: Hi all, Please review this trivial fix which fixes the build failure when C2 is disabled due to the definition of `MaxVectorSize` is missing. Thanks. Best regards, Jie ------------- Commit messages: - 8293497: Build failure due to MaxVectorSize was not declared when C2 is disabled after JDK-8293254 Changes: https://git.openjdk.org/jdk/pull/10202/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10202&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293497 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10202.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10202/head:pull/10202 PR: https://git.openjdk.org/jdk/pull/10202 From dlong at openjdk.org Wed Sep 7 22:35:43 2022 From: dlong at openjdk.org (Dean Long) Date: Wed, 7 Sep 2022 22:35:43 GMT Subject: RFR: 8293497: Build failure due to MaxVectorSize was not declared when C2 is disabled after JDK-8293254 In-Reply-To: References: Message-ID: On Wed, 7 Sep 2022 15:59:20 GMT, Jie Fu wrote: > Hi all, > > Please review this trivial fix which fixes the build failure when C2 is disabled due to the definition of `MaxVectorSize` is missing. > > Thanks. > Best regards, > Jie Changes requested by dlong (Reviewer). src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 38: > 36: #if INCLUDE_JVMCI > 37: #include "jvmci/jvmci_globals.hpp" > 38: #endif Shouldn't that be "utilities/macros.hpp" to pick up the COMPILER2_OR_JVMCI macro? ------------- PR: https://git.openjdk.org/jdk/pull/10202 From jiefu at openjdk.org Wed Sep 7 23:03:47 2022 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 7 Sep 2022 23:03:47 GMT Subject: RFR: 8293497: Build failure due to MaxVectorSize was not declared when C2 is disabled after JDK-8293254 In-Reply-To: References: Message-ID: On Wed, 7 Sep 2022 22:31:34 GMT, Dean Long wrote: > Shouldn't that be "utilities/macros.hpp" to pick up the COMPILER2_OR_JVMCI macro? Thanks @dean-long for the review. You mean we should fix it like this? diff --git a/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp b/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp index b08e1f7..1d149c0 100644 --- a/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp +++ b/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp @@ -30,6 +30,7 @@ #include "runtime/sharedRuntime.hpp" #include "runtime/stubRoutines.hpp" #include "stubGenerator_x86_64.hpp" +#include "utilities/macros.hpp" #ifdef COMPILER2 #include "opto/c2_globals.hpp" #endif But it won't fix it actually. ------------- PR: https://git.openjdk.org/jdk/pull/10202 From dlong at openjdk.org Thu Sep 8 02:17:41 2022 From: dlong at openjdk.org (Dean Long) Date: Thu, 8 Sep 2022 02:17:41 GMT Subject: RFR: 8293497: Build failure due to MaxVectorSize was not declared when C2 is disabled after JDK-8293254 In-Reply-To: References: Message-ID: On Wed, 7 Sep 2022 15:59:20 GMT, Jie Fu wrote: > Hi all, > > Please review this trivial fix which fixes the build failure when C2 is disabled due to the definition of `MaxVectorSize` is missing. > > Thanks. > Best regards, > Jie Looks good and trivial. ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.org/jdk/pull/10202 From dlong at openjdk.org Thu Sep 8 02:17:43 2022 From: dlong at openjdk.org (Dean Long) Date: Thu, 8 Sep 2022 02:17:43 GMT Subject: RFR: 8293497: Build failure due to MaxVectorSize was not declared when C2 is disabled after JDK-8293254 In-Reply-To: References: Message-ID: On Wed, 7 Sep 2022 23:01:32 GMT, Jie Fu wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 38: >> >>> 36: #if INCLUDE_JVMCI >>> 37: #include "jvmci/jvmci_globals.hpp" >>> 38: #endif >> >> Shouldn't that be "utilities/macros.hpp" to pick up the COMPILER2_OR_JVMCI macro? > >> Shouldn't that be "utilities/macros.hpp" to pick up the COMPILER2_OR_JVMCI macro? > > Thanks @dean-long for the review. > > You mean we should fix it like this? > > diff --git a/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp b/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp > index b08e1f7..1d149c0 100644 > --- a/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp > +++ b/src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp > @@ -30,6 +30,7 @@ > #include "runtime/sharedRuntime.hpp" > #include "runtime/stubRoutines.hpp" > #include "stubGenerator_x86_64.hpp" > +#include "utilities/macros.hpp" > #ifdef COMPILER2 > #include "opto/c2_globals.hpp" > #endif > > > But it won't fix it actually. OK. Sorry, I misunderstood the problem. ------------- PR: https://git.openjdk.org/jdk/pull/10202 From jiefu at openjdk.org Thu Sep 8 02:31:49 2022 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 8 Sep 2022 02:31:49 GMT Subject: RFR: 8293497: Build failure due to MaxVectorSize was not declared when C2 is disabled after JDK-8293254 In-Reply-To: References: Message-ID: <77jDDgkv5ale6-WWMu1lwavE0xmGJ77CmlxCtB-hCSA=.507717c4-cb21-46b5-ad5d-837df8755905@github.com> On Thu, 8 Sep 2022 02:14:01 GMT, Dean Long wrote: > Looks good and trivial. Thanks @dean-long . ------------- PR: https://git.openjdk.org/jdk/pull/10202 From jiefu at openjdk.org Thu Sep 8 02:31:50 2022 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 8 Sep 2022 02:31:50 GMT Subject: Integrated: 8293497: Build failure due to MaxVectorSize was not declared when C2 is disabled after JDK-8293254 In-Reply-To: References: Message-ID: On Wed, 7 Sep 2022 15:59:20 GMT, Jie Fu wrote: > Hi all, > > Please review this trivial fix which fixes the build failure when C2 is disabled due to the definition of `MaxVectorSize` is missing. > > Thanks. > Best regards, > Jie This pull request has now been integrated. Changeset: 66772273 Author: Jie Fu URL: https://git.openjdk.org/jdk/commit/6677227301acf06eb8be264e4eb3e092d0d7442f Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8293497: Build failure due to MaxVectorSize was not declared when C2 is disabled after JDK-8293254 Reviewed-by: dlong ------------- PR: https://git.openjdk.org/jdk/pull/10202 From haosun at openjdk.org Thu Sep 8 02:50:52 2022 From: haosun at openjdk.org (Hao Sun) Date: Thu, 8 Sep 2022 02:50:52 GMT Subject: RFR: 8292587: AArch64: Support SVE fabd instruction [v2] In-Reply-To: References: Message-ID: > Scalar and NEON fabd instructions were initially supported in > JDK-8256318. In this patch, we support SVE fabd instruction [1] and add > one Jtreg test case as well. > > With this patch, two instructions `fsub + fabs` would be combined into > one single `fabd` instruction. > > > fsub z16.s, z16.s, z17.s > fabs z16.s, p7/m, z16.s > > --> > > fabd z16.s, p7/m, z16.s, z17.s > > > In the initial evaluation of JMH case, i.e. > FloatingScalarVectorAbsDiff.java, we found the performance uplift done > by this optimization was easily hidden by the heavy memory load/store > instructions. To avoid that, we updated the JMH case a bit, adding one > more group of subtraction and Math.abs operations in the loop body. > > Here shows the data with the new JMH case on one 256-bit SVE machine. We > can observe about 39% and 35% improvements for the two functions > respectively. > > > Benchmark Before After Units > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 260.468 160.965 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 133.963 87.292 ns/op > > > Jtreg testing: tier1~3 passed on one NEON-only machine and one 256-bit SVE machine. > > [1] https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FABD--Floating-point-absolute-difference--predicated-- Hao Sun has updated the pull request incrementally with one additional commit since the last revision: Update the loop limit in VectorAbsDiffTest.java As pointed out by Faye Gao, the test results are not fully verified due to incorrect loop limits. Updated it. Reran the test and no regression. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10011/files - new: https://git.openjdk.org/jdk/pull/10011/files/c6157252..38501195 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10011&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10011&range=00-01 Stats: 8 lines in 1 file changed: 2 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10011.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10011/head:pull/10011 PR: https://git.openjdk.org/jdk/pull/10011 From haosun at openjdk.org Thu Sep 8 02:50:53 2022 From: haosun at openjdk.org (Hao Sun) Date: Thu, 8 Sep 2022 02:50:53 GMT Subject: RFR: 8292587: AArch64: Support SVE fabd instruction [v2] In-Reply-To: References: Message-ID: <_xqn1RPu4vHIp5UsFcjR5KQzcv-XFndtjx5d762pldo=.f03bebfa-bff2-4aa5-a42f-1b3faadee451@github.com> On Wed, 7 Sep 2022 07:40:59 GMT, Fei Gao wrote: >> Hao Sun has updated the pull request incrementally with one additional commit since the last revision: >> >> Update the loop limit in VectorAbsDiffTest.java >> >> As pointed out by Faye Gao, the test results are not fully verified due >> to incorrect loop limits. >> >> Updated it. >> >> Reran the test and no regression. > > test/hotspot/jtreg/compiler/vectorapi/VectorAbsDiffTest.java line 97: > >> 95: public static void testFloatAbsDiff_runner() { >> 96: testFloatAbsDiff(); >> 97: for (int i = 0; i < F_SPECIES.length(); i++) { > > I suppose it should be `for (int i = 0; i < LENGTH; i++) {` here. You can check all similar code lines in the following functions for verification. Good catch! Updated it in the new revision. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10011 From fgao at openjdk.org Thu Sep 8 02:58:43 2022 From: fgao at openjdk.org (Fei Gao) Date: Thu, 8 Sep 2022 02:58:43 GMT Subject: RFR: 8292587: AArch64: Support SVE fabd instruction [v2] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 02:50:52 GMT, Hao Sun wrote: >> Scalar and NEON fabd instructions were initially supported in >> JDK-8256318. In this patch, we support SVE fabd instruction [1] and add >> one Jtreg test case as well. >> >> With this patch, two instructions `fsub + fabs` would be combined into >> one single `fabd` instruction. >> >> >> fsub z16.s, z16.s, z17.s >> fabs z16.s, p7/m, z16.s >> >> --> >> >> fabd z16.s, p7/m, z16.s, z17.s >> >> >> In the initial evaluation of JMH case, i.e. >> FloatingScalarVectorAbsDiff.java, we found the performance uplift done >> by this optimization was easily hidden by the heavy memory load/store >> instructions. To avoid that, we updated the JMH case a bit, adding one >> more group of subtraction and Math.abs operations in the loop body. >> >> Here shows the data with the new JMH case on one 256-bit SVE machine. We >> can observe about 39% and 35% improvements for the two functions >> respectively. >> >> >> Benchmark Before After Units >> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 260.468 160.965 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 133.963 87.292 ns/op >> >> >> Jtreg testing: tier1~3 passed on one NEON-only machine and one 256-bit SVE machine. >> >> [1] https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FABD--Floating-point-absolute-difference--predicated-- > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > Update the loop limit in VectorAbsDiffTest.java > > As pointed out by Faye Gao, the test results are not fully verified due > to incorrect loop limits. > > Updated it. > > Reran the test and no regression. Marked as reviewed by fgao (Author). ------------- PR: https://git.openjdk.org/jdk/pull/10011 From thartmann at openjdk.org Thu Sep 8 06:26:47 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Sep 2022 06:26:47 GMT Subject: RFR: 8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms [v3] In-Reply-To: References: Message-ID: On Tue, 6 Sep 2022 10:02:50 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch extends conversion optimizations added with [JDK-8287835](https://bugs.openjdk.org/browse/JDK-8287835) to optimize following floating point to integral conversions for X86 AVX2 targets:- >> * D2I , D2S, D2B, F2I , F2S, F2B >> >> In addition, it also optimizes following wide vector (64 bytes) double to integer and sub-type conversions for AVX512 targets which do not support AVX512DQ feature. >> * D2I, D2S, D2B >> >> Following are the JMH micro performance results with and without patch. >> >> System configuration: 40C 2S Icelake server (Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz) >> >> BENCHMARK | SIZE | BASELINE (ops/ms) | WITHOPT (ops/ms) | PERF GAIN FACTOR >> -- | -- | -- | -- | -- >> VectorFPtoIntCastOperations.microDouble128ToByte128 | 1024 | 90.603 | 92.797 | 1.024215534 >> VectorFPtoIntCastOperations.microDouble128ToByte256 | 1024 | 81.909 | 82.3 | 1.00477359 >> VectorFPtoIntCastOperations.microDouble128ToByte512 | 1024 | 26.181 | 26.244 | 1.002406325 >> VectorFPtoIntCastOperations.microDouble128ToInteger128 | 1024 | 90.74 | 2537.958 | 27.96956138 >> VectorFPtoIntCastOperations.microDouble128ToInteger256 | 1024 | 81.586 | 2429.599 | 29.7796068 >> VectorFPtoIntCastOperations.microDouble128ToInteger512 | 1024 | 19.406 | 19.61 | 1.010512213 >> VectorFPtoIntCastOperations.microDouble128ToLong128 | 1024 | 91.723 | 90.754 | 0.989435583 >> VectorFPtoIntCastOperations.microDouble128ToShort128 | 1024 | 91.766 | 1984.577 | 21.62649565 >> VectorFPtoIntCastOperations.microDouble128ToShort256 | 1024 | 81.949 | 1940.599 | 23.68056962 >> VectorFPtoIntCastOperations.microDouble128ToShort512 | 1024 | 16.468 | 16.56 | 1.005586592 >> VectorFPtoIntCastOperations.microDouble256ToByte128 | 1024 | 163.331 | 3018.351 | 18.479964 >> VectorFPtoIntCastOperations.microDouble256ToByte256 | 1024 | 148.878 | 3082.034 | 20.70174237 >> VectorFPtoIntCastOperations.microDouble256ToByte512 | 1024 | 50.108 | 51.629 | 1.030354434 >> VectorFPtoIntCastOperations.microDouble256ToInteger128 | 1024 | 159.805 | 4619.421 | 28.90661118 >> VectorFPtoIntCastOperations.microDouble256ToInteger256 | 1024 | 143.876 | 4649.642 | 32.31700909 >> VectorFPtoIntCastOperations.microDouble256ToInteger512 | 1024 | 38.127 | 38.188 | 1.001599916 >> VectorFPtoIntCastOperations.microDouble256ToLong128 | 1024 | 160.322 | 162.442 | 1.013223388 >> VectorFPtoIntCastOperations.microDouble256ToLong256 | 1024 | 141.252 | 143.01 | 1.012445841 >> VectorFPtoIntCastOperations.microDouble256ToShort128 | 1024 | 157.717 | 3757.471 | 23.82413437 >> VectorFPtoIntCastOperations.microDouble256ToShort256 | 1024 | 143.876 | 3830.971 | 26.62689399 >> VectorFPtoIntCastOperations.microDouble256ToShort512 | 1024 | 32.061 | 32.911 | 1.026511962 >> VectorFPtoIntCastOperations.microFloat128ToByte128 | 1024 | 146.599 | 4002.967 | 27.30555461 >> VectorFPtoIntCastOperations.microFloat128ToByte256 | 1024 | 136.99 | 3938.799 | 28.75245638 >> VectorFPtoIntCastOperations.microFloat128ToByte512 | 1024 | 51.561 | 50.284 | 0.975233219 >> VectorFPtoIntCastOperations.microFloat128ToInteger128 | 1024 | 5933.565 | 5361.472 | 0.903583596 >> VectorFPtoIntCastOperations.microFloat128ToInteger256 | 1024 | 5079.564 | 5062.046 | 0.996551279 >> VectorFPtoIntCastOperations.microFloat128ToInteger512 | 1024 | 37.101 | 38.419 | 1.035524649 >> VectorFPtoIntCastOperations.microFloat128ToLong128 | 1024 | 145.863 | 145.362 | 0.99656527 >> VectorFPtoIntCastOperations.microFloat128ToLong256 | 1024 | 131.159 | 133.154 | 1.015210546 >> VectorFPtoIntCastOperations.microFloat128ToShort128 | 1024 | 145.966 | 4150.039 | 28.4315457 >> VectorFPtoIntCastOperations.microFloat128ToShort256 | 1024 | 134.703 | 4566.589 | 33.90116775 >> VectorFPtoIntCastOperations.microFloat128ToShort512 | 1024 | 31.878 | 30.867 | 0.968285338 >> VectorFPtoIntCastOperations.microFloat256ToByte128 | 1024 | 237.841 | 6292.051 | 26.4548627 >> VectorFPtoIntCastOperations.microFloat256ToByte256 | 1024 | 222.041 | 6292.748 | 28.34047766 >> VectorFPtoIntCastOperations.microFloat256ToByte512 | 1024 | 92.073 | 88.981 | 0.966417951 >> VectorFPtoIntCastOperations.microFloat256ToInteger128 | 1024 | 11471.121 | 10269.636 | 0.895260019 >> VectorFPtoIntCastOperations.microFloat256ToInteger256 | 1024 | 10729.816 | 10105.92 | 0.941853989 >> VectorFPtoIntCastOperations.microFloat256ToInteger512 | 1024 | 68.328 | 70.005 | 1.024543379 >> VectorFPtoIntCastOperations.microFloat256ToLong128 | 1024 | 247.101 | 248.571 | 1.005948984 >> VectorFPtoIntCastOperations.microFloat256ToLong256 | 1024 | 225.74 | 223.987 | 0.992234429 >> VectorFPtoIntCastOperations.microFloat256ToLong512 | 1024 | 76.39 | 76.187 | 0.997342584 >> VectorFPtoIntCastOperations.microFloat256ToShort128 | 1024 | 233.196 | 8202.179 | 35.17289748 >> VectorFPtoIntCastOperations.microFloat256ToShort256 | 1024 | 220.75 | 7781.073 | 35.24834881 >> VectorFPtoIntCastOperations.microFloat256ToShort512 | 1024 | 58.143 | 55.633 | 0.956830573 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8288043: Some mainline merge realted cleanups. Testing in our system did not show any failures but I see that there are SIGILL failures in the pre-submit testing. ------------- PR: https://git.openjdk.org/jdk/pull/9748 From chagedorn at openjdk.org Thu Sep 8 06:54:50 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 8 Sep 2022 06:54:50 GMT Subject: RFR: 8291599: Assertion in PhaseIdealLoop::skeleton_predicate_has_opaque after JDK-8289127 In-Reply-To: References: Message-ID: On Thu, 25 Aug 2022 12:30:02 GMT, Roland Westrelin wrote: > In TestPhiInSkeletonPredicateExpression.test1(): > > - Loop predication adds predicates for the null check of array and the > array range check. It also adds skeleton predicates in case of > subsequent unrolling. > > - One of the skeleton predicate has the following shape: > > (Opaque4 (Bool (CmpUL (AddL (AddL (ConvI2L (LoadI (Phi ...))) (ConvI2L (CastII (AddI (OpaqueLoopInit OpaqueLoopStride))))) -1) ...))) > > - Split thru phi pushes the null check through the dominating > region. The skeleton predicate subgraph is transformed to: > > (Opaque4 (Bool (CmpUL (Phi ...) ...))) > > - Logic that processes skeleton predicate can no longer find the > OpaqueLoopInit and OpaqueLoopStride nodes because they are now > behind a phi. That causes the assert to fire. > > The fix I propose is to catch cases where part of a skeleton predicate > expression (a subgraph with a OpaqueLoopInit or OpaqueLoopStride node) > is being split during split if and to clone the entire skeleton > predicate subgraph then. > > There's a already logic for that currently but it only triggers if > PhaseIdealLoop::split_up() tries to split an OpaqueLoopInit or > OpaqueLoopStride. In the case here, the OpaqueLoopInit and > OpaqueLoopStride nodes have control above the region at which split if > occurs. So they are never split by PhaseIdealLoop::split_up(). The > AddL nodes in subgraph are. Otherwise, the fix looks good to me! src/hotspot/share/opto/loopTransform.cpp line 1454: > 1452: } > 1453: > 1454: void PhaseIdealLoop::skeleton_predicate_opaque_helper(Node* n, uint& init, uint& stride) { I suggest to rename this method to make it more clear what its purpose is. Maybe `count_opaque_loop_nodes`? src/hotspot/share/opto/loopTransform.cpp line 1456: > 1454: void PhaseIdealLoop::skeleton_predicate_opaque_helper(Node* n, uint& init, uint& stride) { > 1455: init= 0; > 1456: stride= 0; Suggestion: init = 0; stride = 0; test/hotspot/jtreg/compiler/loopopts/TestPhiInSkeletonPredicateExpression.java line 26: > 24: /* > 25: * @test > 26: * bug 8291599 Suggestion: * @bug 8291599 test/hotspot/jtreg/compiler/loopopts/TestPhiInSkeletonPredicateExpression.java line 28: > 26: * bug 8291599 > 27: * @summary Assertion in PhaseIdealLoop::skeleton_predicate_has_opaque after JDK-8289127 > 28: * @run main/othervm -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:-UseOnStackReplacement -XX:LoopMaxUnroll=0 TestPhiInSkeletonPredicateExpression Since `LoopMaxUnroll` is a C2 flag, we should also add a `@requires vm.compiler2.enabled`. ------------- Changes requested by chagedorn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10022 From fgao at openjdk.org Thu Sep 8 06:58:07 2022 From: fgao at openjdk.org (Fei Gao) Date: Thu, 8 Sep 2022 06:58:07 GMT Subject: RFR: 8275275: AArch64: Fix performance regression after auto-vectorization on NEON [v2] In-Reply-To: References: Message-ID: > For some vector opcodes, there are no corresponding AArch64 NEON > instructions but supporting them benefits vector API. Some of > this kind of opcodes are also used by superword for auto- > vectorization and here is the list: > > VectorCastD2I, VectorCastL2F > MulVL > AddReductionVI/L/F/D > MulReductionVI/L/F/D > AndReductionV, OrReductionV, XorReductionV > > > We did some micro-benchmark performance tests on NEON and found > that some of listed opcodes hurt the performance of loops after > auto-vectorization, but others don't. > > This patch disables those opcodes for superword, which have > obvious performance regressions after auto-vectorization on > NEON. Besides, one jtreg test case, where IR nodes are checked, > is added in the patch to protect the code against change by > mistake in the future. > > Here is the performance data before and after the patch on NEON. > > Benchmark length Mode Cnt Before After Units > AddReductionVD 1024 thrpt 15 450.830 548.001 ops/ms > AddReductionVF 1024 thrpt 15 514.468 548.013 ops/ms > MulReductionVD 1024 thrpt 15 405.613 499.531 ops/ms > MulReductionVF 1024 thrpt 15 451.292 495.061 ops/ms > > Note: > Because superword doesn't vectorize reductions unconnected with > other vector packs, the benchmark function for Add/Mul > reduction is like: > > // private double[] da, db; > // private double dresult; > public void AddReductionVD() { > double result = 1; > for (int i = startIndex; i < length; i++) { > result += (da[i] + db[i]); > } > dresult += result; > } > > > Specially, vector multiply long has been implemented but disabled > for both vector API and superword. Out of the same reason, the > patch re-enables MulVL on NEON for Vector API but still disables > it for superword. The performance uplift on vector API is ~12.8x > on my local. > > Benchmark length Mode Cnt Before After Units > Long128Vector.MUL 1024 thrpt 10 55.015 760.593 ops/ms > MulVL(superword) 1024 thrpt 10 907.788 907.805 ops/ms > > Note: > The superword benchmark function is: > > // private long[] in1, in2, res; > public void MulVL() { > for (int i = 0; i < length; i++) { > res[i] = in1[i] * in2[i]; > } > } > > The Vector API benchmark case is from: > https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Long128Vector.java#L190 Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Fix match rules for mla/mls and add a vector API regression testcase - Merge branch 'master' into fg8275275 - 8275275: AArch64: Fix performance regression after auto-vectorization on NEON For some vector opcodes, there are no corresponding AArch64 NEON instructions but supporting them benefits vector API. Some of this kind of opcodes are also used by superword for auto- vectorization and here is the list: ``` VectorCastD2I, VectorCastL2F MulVL AddReductionVI/L/F/D MulReductionVI/L/F/D AndReductionV, OrReductionV, XorReductionV ``` We did some micro-benchmark performance tests on NEON and found that some of listed opcodes hurt the performance of loops after auto-vectorization, but others don't. This patch disables those opcodes for superword, which have obvious performance regressions after auto-vectorization on NEON. Besides, one jtreg test case, where IR nodes are checked, is added in the patch to protect the code against change by mistake in the future. Here is the performance data before and after the patch on NEON. Benchmark length Mode Cnt Before After Units AddReductionVD 1024 thrpt 15 450.830 548.001 ops/ms AddReductionVF 1024 thrpt 15 514.468 548.013 ops/ms MulReductionVD 1024 thrpt 15 405.613 499.531 ops/ms MulReductionVF 1024 thrpt 15 451.292 495.061 ops/ms Note: Because superword doesn't vectorize reductions unconnected with other vector packs, the benchmark function for Add/Mul reduction is like: ``` // private double[] da, db; // private double dresult; public void AddReductionVD() { double result = 1; for (int i = startIndex; i < length; i++) { result += (da[i] + db[i]); } dresult += result; } ``` Specially, vector multiply long has been implemented but disabled for both vector API and superword. Out of the same reason, the patch re-enables MulVL on NEON for Vector API but still disables it for superword. The performance uplift on vector API is ~12.8x on my local. Benchmark length Mode Cnt Before After Units Long128Vector.MUL 1024 thrpt 10 55.015 760.593 ops/ms MulVL(superword) 1024 thrpt 10 907.788 907.805 ops/ms Note: The superword benchmark function is: ``` // private long[] in1, in2, res; public void MulVL() { for (int i = 0; i < length; i++) { res[i] = in1[i] * in2[i]; } } The Vector API benchmark case is from: https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Long128Vector.java#L190 ``` Change-Id: Ie9133e4010f98b26f97969c02fbf992b11e7edbb ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10175/files - new: https://git.openjdk.org/jdk/pull/10175/files/d02cd800..fad1cc2f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10175&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10175&range=00-01 Stats: 32403 lines in 159 files changed: 16395 ins; 15412 del; 596 mod Patch: https://git.openjdk.org/jdk/pull/10175.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10175/head:pull/10175 PR: https://git.openjdk.org/jdk/pull/10175 From fgao at openjdk.org Thu Sep 8 06:58:07 2022 From: fgao at openjdk.org (Fei Gao) Date: Thu, 8 Sep 2022 06:58:07 GMT Subject: RFR: 8275275: AArch64: Fix performance regression after auto-vectorization on NEON [v2] In-Reply-To: References: Message-ID: On Wed, 7 Sep 2022 02:32:43 GMT, Fei Gao wrote: >> src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 146: >> >>> 144: // Fail fast, otherwise fall through to common vector_size_supported() check. >>> 145: switch (opcode) { >>> 146: case Op_MulVL: >> >> Enabling `MulVL` for vector api is great. Thanks for doing this! However, this might break several match rules like https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2025 and the `vmls`. The assertion in line-2035 might fail if this rule is matched for a long vector and runs on hardwares that do not support sve. One way to fix is adding the predicate to these rules to skip the long vector type for neon. Thanks! > > Thanks for your kind reminder. I'll fix these related rules and add corresponding vector api regression tests in this PR. > Enabling `MulVL` for vector api is great. Thanks for doing this! However, this might break several match rules like https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2025 and the `vmls`. The assertion in line-2035 might fail if this rule is matched for a long vector and runs on hardwares that do not support sve. One way to fix is adding the predicate to these rules to skip the long vector type for neon. Thanks! Done. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10175 From xgong at openjdk.org Thu Sep 8 07:08:52 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 8 Sep 2022 07:08:52 GMT Subject: RFR: 8275275: AArch64: Fix performance regression after auto-vectorization on NEON [v2] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 06:58:07 GMT, Fei Gao wrote: >> For some vector opcodes, there are no corresponding AArch64 NEON >> instructions but supporting them benefits vector API. Some of >> this kind of opcodes are also used by superword for auto- >> vectorization and here is the list: >> >> VectorCastD2I, VectorCastL2F >> MulVL >> AddReductionVI/L/F/D >> MulReductionVI/L/F/D >> AndReductionV, OrReductionV, XorReductionV >> >> >> We did some micro-benchmark performance tests on NEON and found >> that some of listed opcodes hurt the performance of loops after >> auto-vectorization, but others don't. >> >> This patch disables those opcodes for superword, which have >> obvious performance regressions after auto-vectorization on >> NEON. Besides, one jtreg test case, where IR nodes are checked, >> is added in the patch to protect the code against change by >> mistake in the future. >> >> Here is the performance data before and after the patch on NEON. >> >> Benchmark length Mode Cnt Before After Units >> AddReductionVD 1024 thrpt 15 450.830 548.001 ops/ms >> AddReductionVF 1024 thrpt 15 514.468 548.013 ops/ms >> MulReductionVD 1024 thrpt 15 405.613 499.531 ops/ms >> MulReductionVF 1024 thrpt 15 451.292 495.061 ops/ms >> >> Note: >> Because superword doesn't vectorize reductions unconnected with >> other vector packs, the benchmark function for Add/Mul >> reduction is like: >> >> // private double[] da, db; >> // private double dresult; >> public void AddReductionVD() { >> double result = 1; >> for (int i = startIndex; i < length; i++) { >> result += (da[i] + db[i]); >> } >> dresult += result; >> } >> >> >> Specially, vector multiply long has been implemented but disabled >> for both vector API and superword. Out of the same reason, the >> patch re-enables MulVL on NEON for Vector API but still disables >> it for superword. The performance uplift on vector API is ~12.8x >> on my local. >> >> Benchmark length Mode Cnt Before After Units >> Long128Vector.MUL 1024 thrpt 10 55.015 760.593 ops/ms >> MulVL(superword) 1024 thrpt 10 907.788 907.805 ops/ms >> >> Note: >> The superword benchmark function is: >> >> // private long[] in1, in2, res; >> public void MulVL() { >> for (int i = 0; i < length; i++) { >> res[i] = in1[i] * in2[i]; >> } >> } >> >> The Vector API benchmark case is from: >> https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Long128Vector.java#L190 > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Fix match rules for mla/mls and add a vector API regression testcase > - Merge branch 'master' into fg8275275 > - 8275275: AArch64: Fix performance regression after auto-vectorization on NEON > > For some vector opcodes, there are no corresponding AArch64 NEON > instructions but supporting them benefits vector API. Some of > this kind of opcodes are also used by superword for auto- > vectorization and here is the list: > ``` > VectorCastD2I, VectorCastL2F > MulVL > AddReductionVI/L/F/D > MulReductionVI/L/F/D > AndReductionV, OrReductionV, XorReductionV > ``` > > We did some micro-benchmark performance tests on NEON and found > that some of listed opcodes hurt the performance of loops after > auto-vectorization, but others don't. > > This patch disables those opcodes for superword, which have > obvious performance regressions after auto-vectorization on > NEON. Besides, one jtreg test case, where IR nodes are checked, > is added in the patch to protect the code against change by > mistake in the future. > > Here is the performance data before and after the patch on NEON. > > Benchmark length Mode Cnt Before After Units > AddReductionVD 1024 thrpt 15 450.830 548.001 ops/ms > AddReductionVF 1024 thrpt 15 514.468 548.013 ops/ms > MulReductionVD 1024 thrpt 15 405.613 499.531 ops/ms > MulReductionVF 1024 thrpt 15 451.292 495.061 ops/ms > > Note: > Because superword doesn't vectorize reductions unconnected with > other vector packs, the benchmark function for Add/Mul > reduction is like: > ``` > // private double[] da, db; > // private double dresult; > public void AddReductionVD() { > double result = 1; > for (int i = startIndex; i < length; i++) { > result += (da[i] + db[i]); > } > dresult += result; > } > ``` > > Specially, vector multiply long has been implemented but disabled > for both vector API and superword. Out of the same reason, the > patch re-enables MulVL on NEON for Vector API but still disables > it for superword. The performance uplift on vector API is ~12.8x > on my local. > > Benchmark length Mode Cnt Before After Units > Long128Vector.MUL 1024 thrpt 10 55.015 760.593 ops/ms > MulVL(superword) 1024 thrpt 10 907.788 907.805 ops/ms > > Note: > The superword benchmark function is: > ``` > // private long[] in1, in2, res; > public void MulVL() { > for (int i = 0; i < length; i++) { > res[i] = in1[i] * in2[i]; > } > } > > The Vector API benchmark case is from: > https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Long128Vector.java#L190 > > ``` > > Change-Id: Ie9133e4010f98b26f97969c02fbf992b11e7edbb LGTM! Thanks! ------------- Marked as reviewed by xgong (Committer). PR: https://git.openjdk.org/jdk/pull/10175 From thartmann at openjdk.org Thu Sep 8 07:18:59 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Sep 2022 07:18:59 GMT Subject: RFR: 8292301: [REDO v2] C2 crash when allocating array of size too large [v2] In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 09:02:54 GMT, Roland Westrelin wrote: >> On top of the redo, this fixed 2 bugs: >> >> 8288184: the problem here is that the ValidLengthTest input of an >> AllocateArrayNode becomes a constant. The CatchNode would then change >> types if it was reprocessed but it's not. Custom logic is needed to >> enqueue the CatchNode when the ValidLengthTest input of an >> AllocateArrayNode changes. The CastII out of the AllocateArrayNode >> becomes top but the fallthrough path doesn't die. This happens with >> igvn in the case of the bug but could also happen with ccp. I fixed >> both in this patch. >> >> 8291665: the code pattern for this is 2 AllocateArrayNodes out of loop >> with a shared ValidLengthTest input in a loop. When the loop is cloned >> that causes Phis to be added between the AllocateArrayNodes and the >> BoolNode of the ValidLengthTest inputs. Split if runs next and it >> doesn't expect the Phi at the ValidLengthTest inputs. The fix here is >> to clone the Bool/Cmp subgraph down on loop cloning. There's logic for >> that when the use of the bool is an If for instance so I simply added >> a special case to run that logic for an AllocateArrayNode use as >> well. Note that the test case I added fails reliably on 11 but not >> with the current jdk developement branch. AFAICT, the bug is there but >> something unrelated changed and a slightly different graph is built >> for the test case that prevents split if. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > undo needless change Looks good otherwise. src/hotspot/share/opto/loopopts.cpp line 2040: > 2038: // loop to determine which way the loop exited. > 2039: // Loop predicate If node connects to Bool node through Opaque1 node. > 2040: if (use->is_If() || use->is_CMove() || C->is_predicate_opaq(use) || use->Opcode() == Op_Opaque4 || Please add a comment describing the new case. src/hotspot/share/opto/loopopts.cpp line 2410: > 2408: while (split_if_set->size()) { > 2409: Node *iff = split_if_set->pop(); > 2410: uint input = iff->Opcode() == Op_AllocateArray ? AllocateNode::ValidLengthTest : 1; Suggestion: uint input = (iff->Opcode() == Op_AllocateArray) ? AllocateNode::ValidLengthTest : 1; src/hotspot/share/opto/phaseX.cpp line 1643: > 1641: } > 1642: } > 1643: if (use_op == Op_AllocateArray && n == use->in(AllocateNode::ValidLengthTest)) { Please add a comment. src/hotspot/share/opto/phaseX.cpp line 1853: > 1851: // If we changed the receiver type to a call, we need to revisit the Catch node following the call. It's looking for a > 1852: // non-NULL receiver to know when to enable the regular fall-through path in addition to the NullPtrException path. > 1853: // Same if true if the type of a ValidLengthTest input to an AllocateArrayNode changes Suggestion: // Same is true if the type of a ValidLengthTest input to an AllocateArrayNode changes. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/10038 From thartmann at openjdk.org Thu Sep 8 07:22:46 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Sep 2022 07:22:46 GMT Subject: RFR: 8290432: C2 compilation fails with assert(node->_last_del == _last) failed: must have deleted the edge just produced [v3] In-Reply-To: <9AT3XEsyxvrRfTLGGGVCeANLLMDIcRyzKxjagsbYAoo=.e33159d7-c7a3-41bd-90db-4b01068adbeb@github.com> References: <5ff-r2RgTNzao-sZ4D1kKWOPHWwzaCZxDDxyxl1Y0Us=.ae799d57-29ab-42c5-9908-a5811a8db0bc@github.com> <4iWrjtgXQfRvRYkT2_wUGAQkIouqwlng4IJmHyvCHqQ=.6b36faaa-eb22-4e32-bfc2-dfedd645eff2@github.com> <9AT3XEsyxvrRfTLGGGVCeANLLMDIcRyzKxjagsbYAoo=.e33159d7-c7a3-41bd-90db-4b01068adbeb@github.com> Message-ID: On Wed, 24 Aug 2022 02:22:48 GMT, Yi Yang wrote: > Why? Can you clarify more? As I mentioned above, I don't understand how your newly added condition is supposed to work. ------------- PR: https://git.openjdk.org/jdk/pull/9695 From thartmann at openjdk.org Thu Sep 8 07:22:52 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Sep 2022 07:22:52 GMT Subject: RFR: 8288204: GVN Crash: assert() failed: correct memory chain In-Reply-To: References: Message-ID: <9_kZSr9pvaKPynCTNfwdkdQ3gUNs78vnxMQmLomXoMM=.68540abe-4029-47df-ad62-df8f4bacd4b1@github.com> On Mon, 8 Aug 2022 11:12:08 GMT, Yi Yang wrote: >> Hi can I have a review for this fix? LoadBNode::Ideal crashes after performing GVN right after EA. The bad IR is as follows: >> >> ![image](https://user-images.githubusercontent.com/5010047/183106710-3a518e5e-0b59-4c3c-aba4-8b6fcade3519.png) >> >> The memory input of Load#971 is Phi#1109 and the address input of Load#971 is AddP whose object base is CheckCastPP#335: >> >> The type of Phi#1109 is `byte[int:>=0]:exact+any *` while `byte[int:8]:NotNull:exact+any *,iid=177` is the type of CheckCastPP#335 due to EA, they have different alias index, that's why we hit the assertion at L226: >> >> https://github.com/openjdk/jdk/blob/b17a745d7f55941f02b0bdde83866aa5d32cce07/src/hotspot/share/opto/memnode.cpp#L207-L226 >> (t is `byte[int:>=0]:exact+any *`, t_adr is `byte[int:8]:NotNull:exact+any *,iid=177`). >> >> There is a long story. In the beginning, LoadB#971 is generated at array_copy_forward, and GVN transformed it iteratively: >> >> 971 LoadB === 1115 1046 969 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 1109 969 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 1109 969 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 1109 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> ... >> >> In this case, we get alias index 5 from address input AddP#969, and step it through MergeMem#1046, we found Phi#1109 then, that's why LoadB->in(Mem) is changed from MergeMem#1046 to Phi#1109 (Which finally leads to crash). >> >> 1046 MergeMem === _ 1 160 389 389 1109 1 1 389 1 1 1 1 1 1 1 1 1 1 1 1 1 709 709 709 709 882 888 894 190 190 912 191 [[ 1025 1021 1017 1013 1009 1005 1002 1001 998 996 991 986 981 976 971 966 962 961 960 121 122 123 124 1027 ]] >> >> >> After applying this patch, some related nodes are pushed into the GVN worklist, before stepping through MergeMem#1046, the address input is already changed to AddP#473. i.e., we get alias index 32 from address input AddP#473, and step it through MergeMem#1046, we found StoreB#191 then,LoadB->in(Mem) is changed from MergeMem#1046 to StoreB#191. >> >> 971 LoadB === 1115 1046 969 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 1046 969 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 1046 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 468 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 468 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 390 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> ... >> >> The well-formed IR looks like this: >> ![image](https://user-images.githubusercontent.com/5010047/183239456-7096ea66-6fca-4c84-8f46-8c42d10b686a.png) >> >> Thanks for your patience. > >> 1 LoadB === 1115 1046 969 [[ 972 ]] @b > > Hi @TobiHartmann , this patch works well with StressIGVN. There is an explicit dependency path > > https://github.com/openjdk/jdk/blob/b17a745d7f55941f02b0bdde83866aa5d32cce07/src/hotspot/share/opto/memnode.cpp#L322-L327 > > i.e. load node delayed its idealization until its memory input is processed. This means, MergeMem#1046 and its related node were always processed before processing load node. That's why we saw load->in(Addr) was changed from 969 to 473. > > 971 LoadB === 1115 1046 969 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) > 971 LoadB === 1115 1046 969 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) > 971 LoadB === 1115 1046 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) > 971 LoadB === 1115 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) Comment to keep this open. @kelthuzadx, let me know if you need help with reproducing this. ------------- PR: https://git.openjdk.org/jdk/pull/9777 From thartmann at openjdk.org Thu Sep 8 07:48:47 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Sep 2022 07:48:47 GMT Subject: RFR: 8293287 add ReplayReduce flag [v3] In-Reply-To: References: Message-ID: On Sat, 3 Sep 2022 21:56:50 GMT, Dean Long wrote: >> Add an experimental flag to help developers "reduce" a replay file. >> >> As a first step, I plan to simulate reduced inlining. This will output multiple "compile" lines as if the first level of inlining never happened: >> A --> B --> C >> A --> D --> E >> becomes >> B --> C >> D --> E >> Developers can repeat iteratively until the replay crash no longer reproduces. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > remove support for pre-2013 replay files I'm wondering if this functionality should really be part of the VM. Wouldn't a simple script that regex-parses the compile statement of the replay file, iteratively removes inlines and runs replay compilation to check if the issue still reproduces, be more powerful and easier to maintain? It could be combined with also removing class loading statements. src/hotspot/share/opto/compile.cpp line 4583: > 4581: return; > 4582: } > 4583: // Enable interative replay file reduction Suggestion: // Enable iterative replay file reduction ------------- PR: https://git.openjdk.org/jdk/pull/10134 From thartmann at openjdk.org Thu Sep 8 08:24:56 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Sep 2022 08:24:56 GMT Subject: RFR: 8292675: Add identity transformation for removing redundant AndV/OrV nodes In-Reply-To: References: Message-ID: On Mon, 5 Sep 2022 10:21:11 GMT, Bhavana Kilambi wrote: > Recently we found that the rotate left/right benchmarks with vectorapi > emit a redundant "and" instruction on both aarch64 and x86_64 machines > which can be done away with. For example - and(and(a, b), b) generates > two "and" instructions which can be reduced to a single "and" operation- > and(a, b) since "and" (and "or") operations are commutative and > idempotent in nature. This can help improve performance for all those > workloads which have multiple "and"/"or" operations with the same value > by reducing them to fewer "and"/"or" operations accordingly. > > This patch adds the following transformations for vector logical > operations - AndV and OrV : > > > (OpV (OpV a b) b) => (OpV a b) > (OpV (OpV a b) a) => (OpV a b) > (OpV (OpV a b m1) b m1) => (OpV a b m1) > (OpV (OpV a b m1) a m1) => (OpV a b m1) > (OpV a (OpV a b)) => (OpV a b) > (OpV b (OpV a b)) => (OpV a b) > (OpV a (OpV a b m) m) => (OpV a b m) > > where Op = "And", "Or" > > Links for benchmarks tested are given below :- > https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L728 > https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L764 > https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L728 > https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L764 > > Before this patch, the disassembly for one these testcases > (IntMaxVector.ROR) for Neon is shown below : > ``` > ldr q16, [x12, #16] > and v16.16b, v16.16b, v20.16b > and v16.16b, v16.16b, v20.16b > add x12, x16, x11 > sub v17.4s, v21.4s, v16.4s > ... > ... > > > After this patch, the disassembly for the same testcase above is shown > below : > > ldr q16, [x12, #16] > and v16.16b, v16.16b, v20.16b > add x12, x16, x11 > sub v17.4s, v21.4s, v16.4s > ... > ... > > > The other tests also emit an extra "and" instruction as shown above for > the vector ROR/ROL operations. > > Below are the performance results for the vectorapi rotate tests (tests > given in the links above) with this patch on aarch64 and x86_64 machines > (for int and long types) - > > > Benchmark aarch64 x86_64 > IntMaxVector.ROL 25.57% 26.09% > IntMaxVector.ROR 23.75% 24.15% > LongMaxVector.ROL 28.91% 28.51% > LongMaxVector.ROR 16.51% 29.11% > > > > The percentage indicates the percent gain/improvement in performance > (ops/ms) with this patch over the master build without this patch. The > machine descriptions are given below - > aarch64 - 128-bit aarch64 machine > x86_64 - 256-bit x86 machine Changes requested by thartmann (Reviewer). src/hotspot/share/opto/vectornode.cpp line 1893: > 1891: if (((!n->is_predicated_vector() && !n1->is_predicated_vector()) || > 1892: (n->is_predicated_vector() && n1->is_predicated_vector() && > 1893: n->in(3) == n1->in(3))) && (n->in(2) == n1->in(1) || n->in(2) == n1->in(2))) { This condition is hard to parse, I would suggest: Suggestion: if (((!n->is_predicated_vector() && !n1->is_predicated_vector()) || ( n->is_predicated_vector() && n1->is_predicated_vector() && n->in(3) == n1->in(3))) && (n->in(2) == n1->in(1) || n->in(2) == n1->in(2))) { src/hotspot/share/opto/vectornode.cpp line 1903: > 1901: // (OperationV src2 (OperationV src1 src2)) => OperationV(src1, src2) > 1902: if (!n->is_predicated_vector() && !n2->is_predicated_vector() && > 1903: (n->in(1) == n2->in(1) || n->in(1) == n2->in(2))) { Suggestion: if (!n->is_predicated_vector() && !n2->is_predicated_vector() && (n->in(1) == n2->in(1) || n->in(1) == n2->in(2))) { src/hotspot/share/opto/vectornode.cpp line 1907: > 1905: // (OperationV src1 (OperationV src1 src2 m1) m1) => OperationV(src1 src2 m1) > 1906: } else if (n->is_predicated_vector() && n2->is_predicated_vector() && > 1907: n->in(3) == n2->in(3) && n->in(1) == n2->in(1)) { I think this should be merged into line 1902. Why did you omit the `(OperationV src2 (OperationV src1 src2 m1) m1) => OperationV(src1 src2 m1)` case? ------------- PR: https://git.openjdk.org/jdk/pull/10163 From thartmann at openjdk.org Thu Sep 8 08:39:42 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Sep 2022 08:39:42 GMT Subject: RFR: 8291599: Assertion in PhaseIdealLoop::skeleton_predicate_has_opaque after JDK-8289127 In-Reply-To: References: Message-ID: <--baVNdOc82Bb2CiQoSx1SljvpUhy867sxiFUcu_mcs=.f3041218-2631-4ab8-b1e5-67d7e857354d@github.com> On Thu, 25 Aug 2022 12:30:02 GMT, Roland Westrelin wrote: > In TestPhiInSkeletonPredicateExpression.test1(): > > - Loop predication adds predicates for the null check of array and the > array range check. It also adds skeleton predicates in case of > subsequent unrolling. > > - One of the skeleton predicate has the following shape: > > (Opaque4 (Bool (CmpUL (AddL (AddL (ConvI2L (LoadI (Phi ...))) (ConvI2L (CastII (AddI (OpaqueLoopInit OpaqueLoopStride))))) -1) ...))) > > - Split thru phi pushes the null check through the dominating > region. The skeleton predicate subgraph is transformed to: > > (Opaque4 (Bool (CmpUL (Phi ...) ...))) > > - Logic that processes skeleton predicate can no longer find the > OpaqueLoopInit and OpaqueLoopStride nodes because they are now > behind a phi. That causes the assert to fire. > > The fix I propose is to catch cases where part of a skeleton predicate > expression (a subgraph with a OpaqueLoopInit or OpaqueLoopStride node) > is being split during split if and to clone the entire skeleton > predicate subgraph then. > > There's a already logic for that currently but it only triggers if > PhaseIdealLoop::split_up() tries to split an OpaqueLoopInit or > OpaqueLoopStride. In the case here, the OpaqueLoopInit and > OpaqueLoopStride nodes have control above the region at which split if > occurs. So they are never split by PhaseIdealLoop::split_up(). The > AddL nodes in subgraph are. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/10022 From chagedorn at openjdk.org Thu Sep 8 09:22:51 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 8 Sep 2022 09:22:51 GMT Subject: RFR: JDK-8290011: IGV: Remove dead code and cleanup In-Reply-To: References: Message-ID: On Wed, 7 Sep 2022 11:45:45 GMT, Tobias Holenstein wrote: > Remove dead code from the IGV code base. There are many unused or redundant functions in the code Nice cleanup! I only have some general comments as I'm not very familiar with the code details. @robcasloz should also have a look. What kind of testing did you do? src/utils/IdealGraphVisualizer/Data/src/main/java/com/sun/hotspot/igv/data/Group.java line 83: > 81: > 82: public List getGraphs() { > 83: return Collections.unmodifiableList(graphs); Suggestion: return Collections.unmodifiableList(graphs); src/utils/IdealGraphVisualizer/Data/src/main/java/com/sun/hotspot/igv/data/Group.java line 102: > 100: > 101: @Override > 102: public String toString() { You might want to keep this method for debugging purposes? src/utils/IdealGraphVisualizer/Data/src/main/java/com/sun/hotspot/igv/data/InputNode.java line 34: > 32: public class InputNode extends Properties.Entity { > 33: > 34: private int id; While cleaning this class up anyways: Feels like a node id should probably not change anymore once it's set. Can this be turned into a `final` field? Looks like `setId()` is only called from this class and once from another class when creating a new input node anyways. src/utils/IdealGraphVisualizer/Data/src/main/java/com/sun/hotspot/igv/data/InputNode.java line 36: > 34: private int id; > 35: > 36: public static final Comparator COMPARATOR = new Comparator() { Is unused as well and can be removed. Same for `getPropertyComparator()`. src/utils/IdealGraphVisualizer/Graph/src/main/java/com/sun/hotspot/igv/graph/Diagram.java line 44: > 42: private static final Font font = new Font("Arial", Font.PLAIN, 12); > 43: private static final Font slotFont = new Font("Arial", Font.PLAIN, 10); > 44: private static final Font boldFont = font.deriveFont(Font.BOLD); Maybe make them `public` and access them directly instead of going over `static` getters. Also, I suggest to use upper case letters for static final fields src/utils/IdealGraphVisualizer/Graph/src/main/java/com/sun/hotspot/igv/graph/Figure.java line 235: > 233: > 234: public InputNode getInputNode() { > 235: return this.inputNode; `this` can be omitted. Suggestion: return inputNode; src/utils/IdealGraphVisualizer/SelectionCoordinator/src/main/java/com/sun/hotspot/igv/selectioncoordinator/SelectionCoordinator.java line 52: > 50: highlightedChangedEvent = new ChangedEvent(this); > 51: selectedObjects = new HashSet(); > 52: highlightedObjects = new HashSet(); The explicit generic type argument can be omitted. src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramViewModel.java line 92: > 90: boolean viewPropertiesChanged = false; > 91: > 92: boolean groupChanged = (group != newModel.group); Was that a bug? src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java line 322: > 320: > 321: public DiagramViewModel getModel() { > 322: return scene.getModel(); Suggestion: return scene.getModel(); src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ShowAllAction.java line 43: > 41: EditorTopComponent editor = EditorTopComponent.getActive(); > 42: if (editor != null) { > 43: editor.getModel().setHiddenNodes(new HashSet()); Suggestion: editor.getModel().setHiddenNodes(new HashSet<>()); ------------- PR: https://git.openjdk.org/jdk/pull/10197 From thartmann at openjdk.org Thu Sep 8 10:58:59 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Sep 2022 10:58:59 GMT Subject: RFR: 8293491: Avoid unexpected deoptimization in loop exit due to incorrect branch profiling In-Reply-To: References: Message-ID: On Wed, 7 Sep 2022 15:06:18 GMT, Jie Fu wrote: > Hi all, > > Please review this patch which fixes the unexpected deoptimizations in loop exit due to incorrect branch profiling. > > # Background > > While analyzing our big data Apps, we observed unexpected deoptimizations in loop exit due to incorrect branch profiling. > > Here is a reproducer. > > public class UnexpectedLoopExitDeopt { > public static final int N = 20000000; > > public static int d1[] = new int[N]; > public static int d2[] = new int[N]; > > public static void main(String[] args) { > System.out.println(test(d1)); > System.out.println(test(d2)); > } > > public static int test(int[] a) { > int sum = 0; > for(int i = 0; i < a.length; i++) { > sum += a[i]; > } > return sum; > } > } > > > The following is the compilation sequence. > > 77 1 3 java.lang.Object:: (1 bytes) > 83 2 3 java.lang.String::isLatin1 (19 bytes) > 84 6 3 jdk.internal.util.Preconditions::checkIndex (18 bytes) > 84 3 3 java.lang.String::charAt (25 bytes) > 85 4 3 java.lang.StringLatin1::charAt (15 bytes) > 86 7 3 java.lang.String::coder (15 bytes) > 86 8 3 java.lang.String::hashCode (60 bytes) > 87 5 3 java.lang.String::checkIndex (10 bytes) > 87 9 3 java.lang.String::length (11 bytes) > 93 10 n 0 java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L (native) (static) > 96 11 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLL)L (native) (static) > 96 12 n 0 java.lang.Object::hashCode (native) > 97 13 n 0 java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L (native) > 98 14 3 java.util.Objects::requireNonNull (14 bytes) > 98 15 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L (native) (static) > 98 16 1 java.lang.Enum::ordinal (5 bytes) > 101 17 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLL)V (native) (static) > 102 18 n 0 java.lang.invoke.MethodHandle::invokeBasic(LL)L (native) > 212 19 % 3 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) > 213 20 % 4 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) > 221 19 % 3 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) made not entrant > 221 21 4 UnexpectedLoopExitDeopt::test (24 bytes) > 230 20 % 4 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) made not entrant <--- Unexpected deopt > 0 > 242 21 4 UnexpectedLoopExitDeopt::test (24 bytes) made not entrant <--- Unexpected deopt > 0 > > > The last two deopts (made not entrant) happened in the loop exit which are unexpected. > > > # Reason > > The unexpected deopts were caused by the incorrect branch profiling count (0 taken count for loop predicate). > > Here is the profiling data for `UnexpectedLoopExitDeopt::test`. > We can see that for `if_icmpge` @ bci=7, the count for `not taken` is 264957, while 0 for `taken`. > The profile count for zero taken is obvious incorrect since the loop will finally exit (when `i >= a.length`). > So the taken count should be at least 1 for `if_icmpge` @ bci=7. > > 0 iconst_0 > 1 istore_1 > 2 iconst_0 > 3 istore_2 > > 4 iload_2 > 5 fast_aload_0 > 6 arraylength > 7 if_icmpge 22 > 0 bci: 7 BranchData taken(0) displacement(56) > not taken(264957) > > 10 iload_1 > 11 fast_aload_0 > 12 iload_2 > 13 iaload > 14 iadd > 15 istore_1 > 16 iinc #2 1 > 19 goto 4 > 32 bci: 19 JumpData taken(266667) displacement(-32) > > 22 iload_1 > 23 ireturn > > > # Fix > > The main idea is to detect if the branch taken target is a loop exit. > If so, set the taken count to be at least 1. > This is fine because most loops should be finite and would execute the loop exit code at lease once. > For infinite loops like `while (true) {...}`, the patch won't change the original behaviour since there is no loop exit. > > # Testing > > tier1~3 on Linux/x64, no regression > > Thanks. > Best regards, > Jie Do these deoptimizations really affect performance of your program or did you just spot them when looking at the logs? Such surprising deopts are actually expected with optimistic, profile guided optimizations and happen in many other scenarios as well. They are usually harmless. Also, the profile information is not necessarily incorrect but might just be outdated because we stop profiling once we reach C2. Marking all loop exits as taken seems hacky and might have unexpected side effects. Also, wouldn't C2 still insert a `Deoptimization::Reason_unreached` or `Deoptimization::Reason_unstable_if` trap for subsequent instructions after the loop exit for which profiling also suggests that they were never executed? src/hotspot/share/ci/ciMethodBlocks.cpp line 166: > 164: if (dest_bci < bci) { > 165: next_block->set_is_loop_exit(); > 166: } I think this loop detection logic is both wrong and incomplete. For example, javac generates no `goto` for the following loop: int i = 0; do { } while (i++ < 10); And in the following case, the block after the `goto` corresponding to the `continue` statement is not the loop exit: int i = 0; label: while (true) { i++; if (i == 1) continue label; if (i == 2) break; } I just quickly hacked this, there are probably better examples. ------------- Changes requested by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/10200 From tholenstein at openjdk.org Thu Sep 8 11:02:55 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 8 Sep 2022 11:02:55 GMT Subject: RFR: JDK-8291805: IGV: Improve Zooming [v8] In-Reply-To: References: Message-ID: > # Overview > > The zooming is improved in the following ways: > > 1) Added a minimum (10%) and maximum (400%) zoom level. If you have a sensitive mouse wheel, it can be annoying to zoom in or out too much (until the graph is invisibly small or the nodes are larger than the window) > > 2) Zooming with a trackpad was not very smooth because IGV did panning and zooming at the same time - Now panning is disabled when CMD/Ctrl key is pressed for zooming > > 3) When only a few nodes were selected, zooming was no longer mouse centred. Instead, the center of the zooming was in the upper left corner. Now the zooming is centred to the middle of the scene when all selected nodes fit in the screen. > > 4) Added a shortcut (Ctrl - 0) to reset the zoom level to 100%. > > 5) Updated the Zoom icons to be vector graphics (.svg) > > # Implementation > > 1) New functions `getZoomMinFactor()` and `getZoomMinFactor()` assure that we do not zoom in or out our infinitely. `getZoomMinFactor()` assures that we do not zoom out further if zoom level is <100% and all visible nodes already fit on the screen. > > 2) We introduced a new `MouseCenteredZoomAction.java` for zooming with the mouse/trackpad. `MouseCenteredZoomAction` performs panning when the modifier key is pressed (Ctrl/CMD) and zooming otherwise. The functions `zoomIn ` and `zoomOut` now do animated zooming using `CustomZoomAnimator`. `CustomZoomAnimator` uses the mouse location as the centre of the zoom animation. > > 3) The `JScrollPane` now has a `JPanel centeringPanel` with `GridBagLayout()` that contains the `viewComponent`. This assures that the `viewComponent` is always centred when no scrollbars are visible. This makes the `Widget topLeft, bottomRight` obsolete as we can now add a white border of `BORDER_SIZE` to the `DiagramScene` instead. > > 4) `ZoomResetAction.java` resets the zoom level to 100%. The shortcut is `Ctrl - 0` and the action is available in the menu: `View` -> `Reset Zoom`. It was not added to the icon menu bar in the `EditorTopComponent` because of space issue. > > 5) new self created icons with vector graphics: `zoomIn.svg`, `zoomOut.svg` and `zoomReset.svg` Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: added ZoomLevelAction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10026/files - new: https://git.openjdk.org/jdk/pull/10026/files/dce54bc5..db520ec1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10026&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10026&range=06-07 Stats: 187 lines in 6 files changed: 147 ins; 15 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/10026.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10026/head:pull/10026 PR: https://git.openjdk.org/jdk/pull/10026 From rcastanedalo at openjdk.org Thu Sep 8 11:11:37 2022 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Sep 2022 11:11:37 GMT Subject: RFR: JDK-8293477: IGV: Upgrade to Netbeans Platform 15 In-Reply-To: References: Message-ID: On Wed, 7 Sep 2022 09:17:26 GMT, Tobias Holenstein wrote: > Upgrade IGV and dependencies to the newest Netbeans Platform 15 which was released on September 2022 (officially support running on JDK 11 and JDK 17). > > ## Testing > > Tested the following use cases manually on macOS and JDK 17: > > - build with maven 3.8.1 > - import graphs via network (localhost) > - Save all groups to XML > - Save selected groups to XML > - Remove selected graphs > - Remove selected groups > - Remove all groups > - Open XML graph file > - Expand groups in Outline > - Open a graphs in from same and different group in Outline > - "Open clone" in the Outline > - "Open Difference to current graph" for graphs in same and different group in Outline > - Opening a new graph : Updates the Bytecode and Control Flow window > - Show next / previous graph in current group buttons > - Expand / Reduce the difference selection buttons > - Changing of the difference selection by modifying the slider > - Extract set of selected nodes and check if they are centered > - Hiding of selected nodes > - Showing all nodes again > - Zooming in / out > - Different views: Sea of nodes / clustered seas of nodes / CFG > - Satellite view: button and by pressing the S key > - Enable / Disable "Show neighbouring nodes of fully visible nodes semi-transparent" > - Undo / Redo > - Selection mode: button and by holding Ctrl + mouse-drag > - Searching a node: Selects the node and centres it. Makes the node visible if it is hidden > - Searching a block: Selects all nodes in the block and centres it. Makes the all the nodes in the block visible > - Selecting node(s): adjusts colours in slider. Show property in Properties window > - Hovering a node: highlights node and shows property box > - Hovering a connection: highlights connection and corresponding nodes > - apply filters > - select nodes corresponding to a bytecode > - select nodes corresponding to a basic block in the control flow Tested using JDK 11 and Maven 3.8.4 on both Linux (Ubuntu 20.04) and Windows 10, did not find any regression. Besides the listed use cases, I also tested PDF graph exporting (affected by OpenPDF version update). ------------- Marked as reviewed by rcastanedalo (Reviewer). PR: https://git.openjdk.org/jdk/pull/10195 From thartmann at openjdk.org Thu Sep 8 11:24:41 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Sep 2022 11:24:41 GMT Subject: RFR: 8290169: adlc: Improve child constraints for vector unary operations [v2] In-Reply-To: References: Message-ID: On Thu, 18 Aug 2022 03:27:55 GMT, Hao Sun wrote: >> As demonstrated in [1], the child constrait generated for *predicated >> vector unary operation* is the super set of that generated for the >> *unpredicated* version. As a result, there exists a risk for predicated >> vector unary operaions to match the unpredicated rules by accident. >> >> In this patch, we resolve this issue by generating one extra check >> "rChild == NULL" ONLY for vector unary operations. In this way, the >> child constraints for predicated/unpredicated vector unary operations >> are exclusive now. >> >> Following the example in [1], the dfa state generated for AbsVI is shown >> below. >> >> >> void State::_sub_Op_AbsVI(const Node *n){ >> if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && >> ( UseSVE > 0 ) ) >> { >> unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; >> DFA_PRODUCTION(VREG, vabsI_masked_rule, c) >> } >> if( STATE__VALID_CHILD(_kids[0], VREG) && _kids[1] == NULL && <---- 1 >> ( UseSVE > 0) ) >> { >> unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; >> if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { >> DFA_PRODUCTION(VREG, vabsI_rule, c) >> } >> } >> ... >> >> >> We can see that the constraint at line 1 cannot be matched for >> predicated AbsVI node now. >> >> The main updates are made in adlc/dfa part. Ideally, we should only >> add the extra check for affected platforms, i.e. AVX-512 and SVE. But we >> didn't do that because it would be better not to introduce any >> architecture dependent implementation here. >> >> Besides, workarounds in both ~aarch64_sve.ad~aarch64_vector.ad and x86.ad are removed. 1) >> Many "is_predicated_vector()" checks can be removed in ~aarch64_sve.ad~aarch64_vector.ad >> file. 2) Default instruction cost is used for involving rules in x86.ad >> file. >> >> ~[1]. https://github.com/shqking/jdk/commit/50ec9b19~ >> [1]. https://github.com/shqking/jdk/commit/f7d9621e2 > > Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into 8290169-adlc > > Resolve the conflicts. > - 8290169: adlc: Improve child constraints for vector unary operations > > As demonstrated in [1], the child constrait generated for *predicated > vector unary operation* is the super set of that generated for the > *unpredicated* version. As a result, there exists a risk for predicated > vector unary operaions to match the unpredicated rules by accident. > > In this patch, we resolve this issue by generating one extra check > "rChild == NULL" ONLY for vector unary operations. In this way, the > child constraints for predicated/unpredicated vector unary operations > are exclusive now. > > Following the example in [1], the dfa state generated for AbsVI is shown > below. > > ``` > void State::_sub_Op_AbsVI(const Node *n){ > if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && > ( UseSVE > 0 ) ) > { > unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; > DFA_PRODUCTION(VREG, vabsI_masked_rule, c) > } > if( STATE__VALID_CHILD(_kids[0], VREG) && _kids[1] == NULL && <---- 1 > ( UseSVE > 0) ) > { > unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; > if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { > DFA_PRODUCTION(VREG, vabsI_rule, c) > } > } > ... > ``` > > We can see that the constraint at line 1 cannot be matched for > predicated AbsVI node now. > > The main updates are made in adlc/dfa part. Ideally, we should only > add the extra check for affected platforms, i.e. AVX-512 and SVE. But we > didn't do that because it would be better not to introduce any > architecture dependent implementation here. > > Besides, workarounds in both aarch64_sve.ad and x86.ad are removed. 1) > Many "is_predicated_vector()" checks can be removed in aarch64_sve.ad > file. 2) Default instruction cost is used for involving rules in x86.ad > file. > > [1]. https://github.com/shqking/jdk/commit/50ec9b19 Looks reasonable to me but I'm not an expert in that area. @jatin-bhateja, @sviswa7, @iwanowww could you have a look? ------------- PR: https://git.openjdk.org/jdk/pull/9534 From bkilambi at openjdk.org Thu Sep 8 11:41:54 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 8 Sep 2022 11:41:54 GMT Subject: RFR: 8292675: Add identity transformation for removing redundant AndV/OrV nodes In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 08:19:57 GMT, Tobias Hartmann wrote: >> Recently we found that the rotate left/right benchmarks with vectorapi >> emit a redundant "and" instruction on both aarch64 and x86_64 machines >> which can be done away with. For example - and(and(a, b), b) generates >> two "and" instructions which can be reduced to a single "and" operation- >> and(a, b) since "and" (and "or") operations are commutative and >> idempotent in nature. This can help improve performance for all those >> workloads which have multiple "and"/"or" operations with the same value >> by reducing them to fewer "and"/"or" operations accordingly. >> >> This patch adds the following transformations for vector logical >> operations - AndV and OrV : >> >> >> (OpV (OpV a b) b) => (OpV a b) >> (OpV (OpV a b) a) => (OpV a b) >> (OpV (OpV a b m1) b m1) => (OpV a b m1) >> (OpV (OpV a b m1) a m1) => (OpV a b m1) >> (OpV a (OpV a b)) => (OpV a b) >> (OpV b (OpV a b)) => (OpV a b) >> (OpV a (OpV a b m) m) => (OpV a b m) >> >> where Op = "And", "Or" >> >> Links for benchmarks tested are given below :- >> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L728 >> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L764 >> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L728 >> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L764 >> >> Before this patch, the disassembly for one these testcases >> (IntMaxVector.ROR) for Neon is shown below : >> ``` >> ldr q16, [x12, #16] >> and v16.16b, v16.16b, v20.16b >> and v16.16b, v16.16b, v20.16b >> add x12, x16, x11 >> sub v17.4s, v21.4s, v16.4s >> ... >> ... >> >> >> After this patch, the disassembly for the same testcase above is shown >> below : >> >> ldr q16, [x12, #16] >> and v16.16b, v16.16b, v20.16b >> add x12, x16, x11 >> sub v17.4s, v21.4s, v16.4s >> ... >> ... >> >> >> The other tests also emit an extra "and" instruction as shown above for >> the vector ROR/ROL operations. >> >> Below are the performance results for the vectorapi rotate tests (tests >> given in the links above) with this patch on aarch64 and x86_64 machines >> (for int and long types) - >> >> >> Benchmark aarch64 x86_64 >> IntMaxVector.ROL 25.57% 26.09% >> IntMaxVector.ROR 23.75% 24.15% >> LongMaxVector.ROL 28.91% 28.51% >> LongMaxVector.ROR 16.51% 29.11% >> >> >> >> The percentage indicates the percent gain/improvement in performance >> (ops/ms) with this patch over the master build without this patch. The >> machine descriptions are given below - >> aarch64 - 128-bit aarch64 machine >> x86_64 - 256-bit x86 machine > > src/hotspot/share/opto/vectornode.cpp line 1907: > >> 1905: // (OperationV src1 (OperationV src1 src2 m1) m1) => OperationV(src1 src2 m1) >> 1906: } else if (n->is_predicated_vector() && n2->is_predicated_vector() && >> 1907: n->in(3) == n2->in(3) && n->in(1) == n2->in(1)) { > > I think this should be merged into line 1902. Why did you omit the `(OperationV src2 (OperationV src1 src2 m1) m1) => OperationV(src1 src2 m1)` case? Hi, thank you for your review comments. I will make the suggested changes and upload a new patch. Regarding omitting this condition - `(OperationV src2 (OperationV src1 src2 m1) m1)` => This would result in a different result as compared to `(OperationV src1 src2 m1)`. So the inner `OperationV` would result in `src1` being copied for unmasked lanes and the outer `OperationV` would result in `src2` being copied to the final result for unmasked lanes and thus the results for both the operations is different. So for masked lanes, the result is `OperationV(src1, src2)` but it is not so for the unmasked lanes. So we need to have the first arguments of both `OperationV` to be same so that the unmasked lanes would then have the same value in the end and that can then be optimized to a single `OperationV` node. ------------- PR: https://git.openjdk.org/jdk/pull/10163 From thartmann at openjdk.org Thu Sep 8 11:53:45 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Sep 2022 11:53:45 GMT Subject: RFR: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv [v3] In-Reply-To: References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> Message-ID: On Fri, 2 Sep 2022 06:11:58 GMT, Pengfei Li wrote: >> This is a REDO of JDK-8289996. In previous patch, we defer some strength >> reductions in Ideal functions of `Mul[I|L]Node` to post loop igvn phase >> to fix a range check hoisting issue. More about previous patch can be >> found in PR #9508, where we have described some details of the issue >> we would like to fix. >> >> Previous patch was backed out due to some jtreg failures found. We have >> analyzed those failures one by one and found one of them exposes a real >> performance regression. We see that deferring some strength reductions >> to post loop igvn phase has too much impact. Some vector multiplication >> will not be optimized to vector addition with vector shift after that >> change. So in this REDO we propose the range check hoisting fix with a >> different approach. >> >> In this new patch, we add some recursive pattern matches for scaled loop >> iv in function `PhaseIdealLoop::is_scaled_iv()`. These include matching >> a sum or a difference of two scaled iv expressions. With this, all kinds >> of Ideal-transformed scaled iv expressions can still be recognized. This >> new approach only touches loop transformation code and hence has much >> smaller impact. We have verified that this new approach applies to both >> int range checks and long range checks. >> >> Previously attached jtreg case fails on ppc64 because VectorAPI has no >> vector intrinsics on ppc64 so there's no long range check to hoist. In >> this patch, we limit the test architecture to x64 and AArch64. >> >> Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Update p_short_scale compuation src/hotspot/share/opto/loopTransform.cpp line 2776: > 2774: // This logic is shared by int and long. For int, the result may overflow > 2775: // as we use jlong to compute so do the check here. Long result may also > 2776: // overflow but that's fine because result wraps. But doesn't this mean that we bail out for integer overflows while not bailing out for long overflows? ------------- PR: https://git.openjdk.org/jdk/pull/9851 From rcastanedalo at openjdk.org Thu Sep 8 11:55:45 2022 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Sep 2022 11:55:45 GMT Subject: RFR: JDK-8293480: IGV: Update Bytecode and ControlFlow Component immediately when opening a new graph In-Reply-To: References: Message-ID: On Wed, 7 Sep 2022 09:36:02 GMT, Tobias Holenstein wrote: > The `BytecodeViewTopComponent` and `ControlFlowTopComponent` represent information depending on what graph is open in `EditorTopComponent`. Previously, `BytecodeViewTopComponent` and `ControlFlowTopComponent` did not update its content immediately when a new graph from a different group is opened in `EditorTopComponent`. They also did not update when switching between two tabs of open graph. > > We missed to `fire()` a `diagramChangedEvent` in the constructor of `EditorTopComponent`. We also need to fire when `BytecodeViewTopComponent` and `ControlFlowTopComponent` are initially opened. > Update Thanks for this UI improvement, Tobias, looks good to me! There is one more case where the Bytecode and Control Flow windows get out of sync: after removing all graphs and groups in the Outline, they still show the content of the graph that was last active: ![bytecode-and-cfg-leftovers](https://user-images.githubusercontent.com/8792647/189114719-770ba617-e94c-4492-a5ab-81047b8a0b98.png) This problem existed before the changeset, so it might be addressed here or in a separate issue, whatever you think makes more sense. src/utils/IdealGraphVisualizer/Bytecodes/src/main/java/com/sun/hotspot/igv/bytecodes/BytecodeViewTopComponent.java line 176: > 174: SwingUtilities.invokeLater(new Runnable() { > 175: public void run() { > 176: final InputGraphProvider provider = LookupHistory.getLast(InputGraphProvider.class); Suggestion: use a multi-line lambda for conciseness, like so: SwingUtilities.invokeLater(() -> { final InputGraphProvider provider = LookupHistory.getLast(InputGraphProvider.class); src/utils/IdealGraphVisualizer/ControlFlow/src/main/java/com/sun/hotspot/igv/controlflow/ControlFlowTopComponent.java line 142: > 140: SwingUtilities.invokeLater(new Runnable() { > 141: public void run() { > 142: final InputGraphProvider provider = LookupHistory.getLast(InputGraphProvider.class); Same suggestion as above. src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java line 153: > 151: SwingUtilities.invokeLater(new Runnable() { > 152: public void run() { > 153: final InputGraphProvider provider = LookupHistory.getLast(InputGraphProvider.class); Same suggestion as above. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR: https://git.openjdk.org/jdk/pull/10196 From thartmann at openjdk.org Thu Sep 8 12:02:41 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Sep 2022 12:02:41 GMT Subject: RFR: 8292675: Add identity transformation for removing redundant AndV/OrV nodes In-Reply-To: References: Message-ID: <2JiESg4ACY8as7JSVt5hw17nxO7yjQmJXuH0lM3PbdI=.2352df9b-256b-41e3-8046-6f969605199e@github.com> On Thu, 8 Sep 2022 11:38:49 GMT, Bhavana Kilambi wrote: >> src/hotspot/share/opto/vectornode.cpp line 1907: >> >>> 1905: // (OperationV src1 (OperationV src1 src2 m1) m1) => OperationV(src1 src2 m1) >>> 1906: } else if (n->is_predicated_vector() && n2->is_predicated_vector() && >>> 1907: n->in(3) == n2->in(3) && n->in(1) == n2->in(1)) { >> >> I think this should be merged into line 1902. Why did you omit the `(OperationV src2 (OperationV src1 src2 m1) m1) => OperationV(src1 src2 m1)` case? > > Hi, thank you for your review comments. I will make the suggested changes and upload a new patch. > Regarding omitting this condition - `(OperationV src2 (OperationV src1 src2 m1) m1)` => This would result in a different result as compared to `(OperationV src1 src2 m1)`. So the inner `OperationV` would result in `src1` being copied for unmasked lanes and the outer `OperationV` would result in `src2` being copied to the final result for unmasked lanes and thus the results for both the operations is different. So for masked lanes, the result is `OperationV(src1, src2)` but it is not so for the unmasked lanes. So we need to have the first arguments of both `OperationV` to be same so that the unmasked lanes would then have the same value in the end and that can then be optimized to a single `OperationV` node. Right, I missed that. Would probably not hurt adding a corresponding comment. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10163 From thartmann at openjdk.org Thu Sep 8 12:23:49 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 Sep 2022 12:23:49 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v2] In-Reply-To: References: Message-ID: On Tue, 9 Aug 2022 13:18:05 GMT, Quan Anh Mai wrote: >> The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. >> >> Testing: Manually run the test case in the JBS and look at the compiled code. >> >> I also do some small clean-ups in x86_64.ad: >> >> - For some reasons, `incl(Address)` is less efficient than `addl(Address, int)` as the former results in 3 uops in the fused domain as opposed to 2 in cases of the latter (according to [uops.info](uops.info)). As a result, I propose to remove the corresponding rules. >> - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. >> - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. >> - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed >> - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. >> - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. >> >> Please kindly review, thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add benchmark Intel folks (@jatin-bhateja, @sviswa7?) should have a look at this as well. ------------- PR: https://git.openjdk.org/jdk/pull/9791 From tholenstein at openjdk.org Thu Sep 8 12:32:44 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 8 Sep 2022 12:32:44 GMT Subject: RFR: JDK-8291805: IGV: Improve Zooming [v9] In-Reply-To: References: Message-ID: > # Overview > > The zooming is improved in the following ways: > > 1) Added a minimum (10%) and maximum (400%) zoom level. If you have a sensitive mouse wheel, it can be annoying to zoom in or out too much (until the graph is invisibly small or the nodes are larger than the window) > > 2) Zooming with a trackpad was not very smooth because IGV did panning and zooming at the same time - Now panning is disabled when CMD/Ctrl key is pressed for zooming > > 3) When only a few nodes were selected, zooming was no longer mouse centred. Instead, the center of the zooming was in the upper left corner. Now the zooming is centred to the middle of the scene when all selected nodes fit in the screen. > > 4) Added a shortcut (Ctrl - 0) to reset the zoom level to 100%. > > 5) Updated the Zoom icons to be vector graphics (.svg) > > # Implementation > > 1) New functions `getZoomMinFactor()` and `getZoomMinFactor()` assure that we do not zoom in or out our infinitely. `getZoomMinFactor()` assures that we do not zoom out further if zoom level is <100% and all visible nodes already fit on the screen. > > 2) We introduced a new `MouseCenteredZoomAction.java` for zooming with the mouse/trackpad. `MouseCenteredZoomAction` performs panning when the modifier key is pressed (Ctrl/CMD) and zooming otherwise. The functions `zoomIn ` and `zoomOut` now do animated zooming using `CustomZoomAnimator`. `CustomZoomAnimator` uses the mouse location as the centre of the zoom animation. > > 3) The `JScrollPane` now has a `JPanel centeringPanel` with `GridBagLayout()` that contains the `viewComponent`. This assures that the `viewComponent` is always centred when no scrollbars are visible. This makes the `Widget topLeft, bottomRight` obsolete as we can now add a white border of `BORDER_SIZE` to the `DiagramScene` instead. > > 4) `ZoomResetAction.java` resets the zoom level to 100%. The shortcut is `Ctrl - 0` and the action is available in the menu: `View` -> `Reset Zoom`. It was not added to the icon menu bar in the `EditorTopComponent` because of space issue. > > 5) new self created icons with vector graphics: `zoomIn.svg`, `zoomOut.svg` and `zoomReset.svg` Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: animate Zoom to center ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10026/files - new: https://git.openjdk.org/jdk/pull/10026/files/db520ec1..cd2f3a0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10026&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10026&range=07-08 Stats: 14 lines in 2 files changed: 6 ins; 5 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10026.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10026/head:pull/10026 PR: https://git.openjdk.org/jdk/pull/10026 From tholenstein at openjdk.org Thu Sep 8 12:40:33 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 8 Sep 2022 12:40:33 GMT Subject: RFR: JDK-8293480: IGV: Update Bytecode and ControlFlow Component immediately when opening a new graph [v2] In-Reply-To: References: Message-ID: > The `BytecodeViewTopComponent` and `ControlFlowTopComponent` represent information depending on what graph is open in `EditorTopComponent`. Previously, `BytecodeViewTopComponent` and `ControlFlowTopComponent` did not update its content immediately when a new graph from a different group is opened in `EditorTopComponent`. They also did not update when switching between two tabs of open graph. > > We missed to `fire()` a `diagramChangedEvent` in the constructor of `EditorTopComponent`. We also need to fire when `BytecodeViewTopComponent` and `ControlFlowTopComponent` are initially opened. > Update Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: use multi-line lambdas ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10196/files - new: https://git.openjdk.org/jdk/pull/10196/files/3d06f784..ac0d42af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10196&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10196&range=00-01 Stats: 39 lines in 3 files changed: 3 ins; 9 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/10196.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10196/head:pull/10196 PR: https://git.openjdk.org/jdk/pull/10196 From tholenstein at openjdk.org Thu Sep 8 12:40:36 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 8 Sep 2022 12:40:36 GMT Subject: RFR: JDK-8293480: IGV: Update Bytecode and ControlFlow Component immediately when opening a new graph [v2] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 11:45:45 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> use multi-line lambdas > > src/utils/IdealGraphVisualizer/Bytecodes/src/main/java/com/sun/hotspot/igv/bytecodes/BytecodeViewTopComponent.java line 176: > >> 174: SwingUtilities.invokeLater(new Runnable() { >> 175: public void run() { >> 176: final InputGraphProvider provider = LookupHistory.getLast(InputGraphProvider.class); > > Suggestion: use a multi-line lambda for conciseness, like so: > > SwingUtilities.invokeLater(() -> { > final InputGraphProvider provider = LookupHistory.getLast(InputGraphProvider.class); done > src/utils/IdealGraphVisualizer/ControlFlow/src/main/java/com/sun/hotspot/igv/controlflow/ControlFlowTopComponent.java line 142: > >> 140: SwingUtilities.invokeLater(new Runnable() { >> 141: public void run() { >> 142: final InputGraphProvider provider = LookupHistory.getLast(InputGraphProvider.class); > > Same suggestion as above. done > src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java line 153: > >> 151: SwingUtilities.invokeLater(new Runnable() { >> 152: public void run() { >> 153: final InputGraphProvider provider = LookupHistory.getLast(InputGraphProvider.class); > > Same suggestion as above. done ------------- PR: https://git.openjdk.org/jdk/pull/10196 From rcastanedalo at openjdk.org Thu Sep 8 12:47:42 2022 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Sep 2022 12:47:42 GMT Subject: RFR: JDK-8290011: IGV: Remove dead code and cleanup In-Reply-To: References: Message-ID: <0NA_dHwRej6PYoU-ejEEJrX7DxdG6AMMh6KC90cU0Yk=.2fb1e525-5561-4cbd-8221-f729378b1755@github.com> On Wed, 7 Sep 2022 11:45:45 GMT, Tobias Holenstein wrote: > Remove dead code from the IGV code base. There are many unused or redundant functions in the code This changeset goes beyond trivial cleanups (removing dead code, trailing whitespace, legacy functionality, etc.), and it would help if you could summarize (and motivate if necessary) the main changes in it. I found that switching among opened graphs from different groups does not update anymore the highlighted graphs in the Outline window, nor the content of the Bytecode and Control Flow windows. Maybe an effect of splitting #10164? A few more comments: - I would also prefer to leave the `toString()` methods in, for ease of debugging. - Why are some tests in `InputGraphTest.java` removed? Were they not run before? - I agree with enforcing alphabetic order of imports, but I would personally prefer to import explicitly all individual classes rather than using wildcards (matter of taste though, I do not think we have any style guidelines for tools like IGV). - Please update the copyright headers, at least for files with non-trivial changes. ------------- Changes requested by rcastanedalo (Reviewer). PR: https://git.openjdk.org/jdk/pull/10197 From rcastanedalo at openjdk.org Thu Sep 8 12:59:39 2022 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Sep 2022 12:59:39 GMT Subject: RFR: JDK-8293480: IGV: Update Bytecode and ControlFlow Component immediately when opening a new graph [v2] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 12:40:33 GMT, Tobias Holenstein wrote: >> The `BytecodeViewTopComponent` and `ControlFlowTopComponent` represent information depending on what graph is open in `EditorTopComponent`. Previously, `BytecodeViewTopComponent` and `ControlFlowTopComponent` did not update its content immediately when a new graph from a different group is opened in `EditorTopComponent`. They also did not update when switching between two tabs of open graph. >> >> We missed to `fire()` a `diagramChangedEvent` in the constructor of `EditorTopComponent`. We also need to fire when `BytecodeViewTopComponent` and `ControlFlowTopComponent` are initially opened. >> Update > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > use multi-line lambdas Marked as reviewed by rcastanedalo (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10196 From rcastanedalo at openjdk.org Thu Sep 8 13:20:34 2022 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Sep 2022 13:20:34 GMT Subject: RFR: JDK-8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs In-Reply-To: References: Message-ID: On Mon, 5 Sep 2022 13:38:41 GMT, Tobias Holenstein wrote: > Refactor the Actions in EditorTopComponent (com.sun.hotspot.igv.view.actions). Move Action specific code from EditorTopComponent to the corresponding Action. > > # Refactoring of com.sun.hotspot.igv.view.actions and EditorTopComponent > - Created a new `ExportGraph` Action and moved corresponding functions `exportToSVG(..)` and `exportToPDF(..)` to new `ExportGraph.java` > - Moved key bindings for satellite-view (pressing S) from `EditorTopComponent` to `OverviewAction.java` > - Moved Action specific code from `EditorTopComponent` to the corresponding `XXXAction.java` > > # Fixing minor Bugs > - "Show empty blocks in control-flow graph view" is selected by default but only enabled in CFG view. > This is distracting for the eye when we are not in CFG: > cfg_before > Now "Show empty blocks in control-flow graph view" is not selected anymore when disabled (greyed out) > cfg_node_disable > But still gets selected by default when enabled > cfg_now > > - "Extract current set of selected nodes", "Hide selected nodes" and "show all nodes" were always enabled, even when they didn't effect anything. > selection_before > Now "Extract current set of selected nodes", "Hide selected nodes" are disabled (greyed out) when no nodes are selected. And "show all nodes" is disabled (greyed out) when all nodes are already visible. > selection_now > > - "Reduce the difference selection" got stuck when at the last graphs in the group because it got greyed out. > reduce_stuck > duce the difference selection" > Now "Reduce the difference selection" works as expected: > reduce_now This changeset seems to disable the keyboard shortcuts for `Extract`, `Show all nodes`, and `Hide` right after a graph is opened. Interestingly, after clicking around for a while, the keyboard shortcuts start working again. Please let me know if you need more details to reproduce the problem, hopefully it is reproducible in other platforms than my own (Ubuntu 20.04). ------------- Changes requested by rcastanedalo (Reviewer). PR: https://git.openjdk.org/jdk/pull/10170 From jiefu at openjdk.org Thu Sep 8 15:06:44 2022 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 8 Sep 2022 15:06:44 GMT Subject: RFR: 8293491: Avoid unexpected deoptimization in loop exit due to incorrect branch profiling In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 10:55:13 GMT, Tobias Hartmann wrote: >> Hi all, >> >> Please review this patch which fixes the unexpected deoptimizations in loop exit due to incorrect branch profiling. >> >> # Background >> >> While analyzing our big data Apps, we observed unexpected deoptimizations in loop exit due to incorrect branch profiling. >> >> Here is a reproducer. >> >> public class UnexpectedLoopExitDeopt { >> public static final int N = 20000000; >> >> public static int d1[] = new int[N]; >> public static int d2[] = new int[N]; >> >> public static void main(String[] args) { >> System.out.println(test(d1)); >> System.out.println(test(d2)); >> } >> >> public static int test(int[] a) { >> int sum = 0; >> for(int i = 0; i < a.length; i++) { >> sum += a[i]; >> } >> return sum; >> } >> } >> >> >> The following is the compilation sequence. >> >> 77 1 3 java.lang.Object:: (1 bytes) >> 83 2 3 java.lang.String::isLatin1 (19 bytes) >> 84 6 3 jdk.internal.util.Preconditions::checkIndex (18 bytes) >> 84 3 3 java.lang.String::charAt (25 bytes) >> 85 4 3 java.lang.StringLatin1::charAt (15 bytes) >> 86 7 3 java.lang.String::coder (15 bytes) >> 86 8 3 java.lang.String::hashCode (60 bytes) >> 87 5 3 java.lang.String::checkIndex (10 bytes) >> 87 9 3 java.lang.String::length (11 bytes) >> 93 10 n 0 java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L (native) (static) >> 96 11 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLL)L (native) (static) >> 96 12 n 0 java.lang.Object::hashCode (native) >> 97 13 n 0 java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L (native) >> 98 14 3 java.util.Objects::requireNonNull (14 bytes) >> 98 15 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L (native) (static) >> 98 16 1 java.lang.Enum::ordinal (5 bytes) >> 101 17 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLL)V (native) (static) >> 102 18 n 0 java.lang.invoke.MethodHandle::invokeBasic(LL)L (native) >> 212 19 % 3 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) >> 213 20 % 4 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) >> 221 19 % 3 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) made not entrant >> 221 21 4 UnexpectedLoopExitDeopt::test (24 bytes) >> 230 20 % 4 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) made not entrant <--- Unexpected deopt >> 0 >> 242 21 4 UnexpectedLoopExitDeopt::test (24 bytes) made not entrant <--- Unexpected deopt >> 0 >> >> >> The last two deopts (made not entrant) happened in the loop exit which are unexpected. >> >> >> # Reason >> >> The unexpected deopts were caused by the incorrect branch profiling count (0 taken count for loop predicate). >> >> Here is the profiling data for `UnexpectedLoopExitDeopt::test`. >> We can see that for `if_icmpge` @ bci=7, the count for `not taken` is 264957, while 0 for `taken`. >> The profile count for zero taken is obvious incorrect since the loop will finally exit (when `i >= a.length`). >> So the taken count should be at least 1 for `if_icmpge` @ bci=7. >> >> 0 iconst_0 >> 1 istore_1 >> 2 iconst_0 >> 3 istore_2 >> >> 4 iload_2 >> 5 fast_aload_0 >> 6 arraylength >> 7 if_icmpge 22 >> 0 bci: 7 BranchData taken(0) displacement(56) >> not taken(264957) >> >> 10 iload_1 >> 11 fast_aload_0 >> 12 iload_2 >> 13 iaload >> 14 iadd >> 15 istore_1 >> 16 iinc #2 1 >> 19 goto 4 >> 32 bci: 19 JumpData taken(266667) displacement(-32) >> >> 22 iload_1 >> 23 ireturn >> >> >> # Fix >> >> The main idea is to detect if the branch taken target is a loop exit. >> If so, set the taken count to be at least 1. >> This is fine because most loops should be finite and would execute the loop exit code at lease once. >> For infinite loops like `while (true) {...}`, the patch won't change the original behaviour since there is no loop exit. >> >> # Testing >> >> tier1~3 on Linux/x64, no regression >> >> Thanks. >> Best regards, >> Jie > > Do these deoptimizations really affect performance of your program or did you just spot them when looking at the logs? > > Such surprising deopts are actually expected with optimistic, profile guided optimizations and happen in many other scenarios as well. They are usually harmless. Also, the profile information is not necessarily incorrect but might just be outdated because we stop profiling once we reach C2. Marking all loop exits as taken seems hacky and might have unexpected side effects. > > Also, wouldn't C2 still insert a `Deoptimization::Reason_unreached` or `Deoptimization::Reason_unstable_if` trap for subsequent instructions after the loop exit for which profiling also suggests that they were never executed? Thanks @TobiHartmann for your review and valuable comments. > Do these deoptimizations really affect performance of your program or did you just spot them when looking at the logs? > I didn't see performance drop due to this issue in real programs. The loop exit deopts were discovered while we were trying to optimize some big data processing patterns. > Such surprising deopts are actually expected with optimistic, profile guided optimizations and happen in many other scenarios as well. They are usually harmless. Also, the profile information is not necessarily incorrect but might just be outdated because we stop profiling once we reach C2. Marking all loop exits as taken seems hacky and might have unexpected side effects. > Well, deoptimization leads to heavy runtime overhead involving control transfering, re-interpreting/profiling and re-compiling, which really wastes cpu cycles and memory. So if some kinds of deopts could be avoided, why not do it? Do you aggree that for most common cases in real programs, the loop exit would be executed at least once? If so, it seems unreasonable to replace the loop exit with an unstable_if uncommon trap during compilation, right? > Also, wouldn't C2 still insert a `Deoptimization::Reason_unreached` or `Deoptimization::Reason_unstable_if` trap for subsequent instructions after the loop exit for which profiling also suggests that they were never executed? It's impossible to avoid all deopts with optimistic, profile guided compilations. So this patch only aims at eliminating unreasonable deopts in loop exit block as many as possible, not all unexpected deopts. And there seems no good reasons to disable them except for the loop exit deopts. ------------- PR: https://git.openjdk.org/jdk/pull/10200 From jiefu at openjdk.org Thu Sep 8 15:10:43 2022 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 8 Sep 2022 15:10:43 GMT Subject: RFR: 8293491: Avoid unexpected deoptimization in loop exit due to incorrect branch profiling In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 10:32:53 GMT, Tobias Hartmann wrote: >> Hi all, >> >> Please review this patch which fixes the unexpected deoptimizations in loop exit due to incorrect branch profiling. >> >> # Background >> >> While analyzing our big data Apps, we observed unexpected deoptimizations in loop exit due to incorrect branch profiling. >> >> Here is a reproducer. >> >> public class UnexpectedLoopExitDeopt { >> public static final int N = 20000000; >> >> public static int d1[] = new int[N]; >> public static int d2[] = new int[N]; >> >> public static void main(String[] args) { >> System.out.println(test(d1)); >> System.out.println(test(d2)); >> } >> >> public static int test(int[] a) { >> int sum = 0; >> for(int i = 0; i < a.length; i++) { >> sum += a[i]; >> } >> return sum; >> } >> } >> >> >> The following is the compilation sequence. >> >> 77 1 3 java.lang.Object:: (1 bytes) >> 83 2 3 java.lang.String::isLatin1 (19 bytes) >> 84 6 3 jdk.internal.util.Preconditions::checkIndex (18 bytes) >> 84 3 3 java.lang.String::charAt (25 bytes) >> 85 4 3 java.lang.StringLatin1::charAt (15 bytes) >> 86 7 3 java.lang.String::coder (15 bytes) >> 86 8 3 java.lang.String::hashCode (60 bytes) >> 87 5 3 java.lang.String::checkIndex (10 bytes) >> 87 9 3 java.lang.String::length (11 bytes) >> 93 10 n 0 java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L (native) (static) >> 96 11 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLL)L (native) (static) >> 96 12 n 0 java.lang.Object::hashCode (native) >> 97 13 n 0 java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L (native) >> 98 14 3 java.util.Objects::requireNonNull (14 bytes) >> 98 15 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L (native) (static) >> 98 16 1 java.lang.Enum::ordinal (5 bytes) >> 101 17 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLL)V (native) (static) >> 102 18 n 0 java.lang.invoke.MethodHandle::invokeBasic(LL)L (native) >> 212 19 % 3 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) >> 213 20 % 4 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) >> 221 19 % 3 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) made not entrant >> 221 21 4 UnexpectedLoopExitDeopt::test (24 bytes) >> 230 20 % 4 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) made not entrant <--- Unexpected deopt >> 0 >> 242 21 4 UnexpectedLoopExitDeopt::test (24 bytes) made not entrant <--- Unexpected deopt >> 0 >> >> >> The last two deopts (made not entrant) happened in the loop exit which are unexpected. >> >> >> # Reason >> >> The unexpected deopts were caused by the incorrect branch profiling count (0 taken count for loop predicate). >> >> Here is the profiling data for `UnexpectedLoopExitDeopt::test`. >> We can see that for `if_icmpge` @ bci=7, the count for `not taken` is 264957, while 0 for `taken`. >> The profile count for zero taken is obvious incorrect since the loop will finally exit (when `i >= a.length`). >> So the taken count should be at least 1 for `if_icmpge` @ bci=7. >> >> 0 iconst_0 >> 1 istore_1 >> 2 iconst_0 >> 3 istore_2 >> >> 4 iload_2 >> 5 fast_aload_0 >> 6 arraylength >> 7 if_icmpge 22 >> 0 bci: 7 BranchData taken(0) displacement(56) >> not taken(264957) >> >> 10 iload_1 >> 11 fast_aload_0 >> 12 iload_2 >> 13 iaload >> 14 iadd >> 15 istore_1 >> 16 iinc #2 1 >> 19 goto 4 >> 32 bci: 19 JumpData taken(266667) displacement(-32) >> >> 22 iload_1 >> 23 ireturn >> >> >> # Fix >> >> The main idea is to detect if the branch taken target is a loop exit. >> If so, set the taken count to be at least 1. >> This is fine because most loops should be finite and would execute the loop exit code at lease once. >> For infinite loops like `while (true) {...}`, the patch won't change the original behaviour since there is no loop exit. >> >> # Testing >> >> tier1~3 on Linux/x64, no regression >> >> Thanks. >> Best regards, >> Jie > > src/hotspot/share/ci/ciMethodBlocks.cpp line 166: > >> 164: if (dest_bci < bci) { >> 165: next_block->set_is_loop_exit(); >> 166: } > > I think this loop detection logic is both wrong and incomplete. For example, javac generates no `goto` for the following loop: > > int i = 0; > do { > > } while (i++ < 10); > > > And in the following case, the block after the `goto` corresponding to the `continue` statement is not the loop exit: > > int i = 0; > label: > while (true) { > i++; > if (i == 1) > continue label; > if (i == 2) > break; > } > > > I just quickly hacked this, there are probably better examples. Nice catch! I missed the `do {...} while (condition)` loop pattern and the `continue` statement. So we should find better way to identify the loop exit block, right? ------------- PR: https://git.openjdk.org/jdk/pull/10200 From tholenstein at openjdk.org Thu Sep 8 16:10:48 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 8 Sep 2022 16:10:48 GMT Subject: RFR: JDK-8293480: IGV: Update Bytecode and ControlFlow Component immediately when opening a new graph [v3] In-Reply-To: References: Message-ID: > The `BytecodeViewTopComponent` and `ControlFlowTopComponent` represent information depending on what graph is open in `EditorTopComponent`. Previously, `BytecodeViewTopComponent` and `ControlFlowTopComponent` did not update its content immediately when a new graph from a different group is opened in `EditorTopComponent`. They also did not update when switching between two tabs of open graph. > > We missed to `fire()` a `diagramChangedEvent` in the constructor of `EditorTopComponent`. We also need to fire when `BytecodeViewTopComponent` and `ControlFlowTopComponent` are initially opened. > Update Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: update TopComponents on closing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10196/files - new: https://git.openjdk.org/jdk/pull/10196/files/ac0d42af..49dbaa31 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10196&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10196&range=01-02 Stats: 74 lines in 7 files changed: 37 ins; 26 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/10196.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10196/head:pull/10196 PR: https://git.openjdk.org/jdk/pull/10196 From bkilambi at openjdk.org Thu Sep 8 16:28:32 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 8 Sep 2022 16:28:32 GMT Subject: RFR: 8292675: Add identity transformation for removing redundant AndV/OrV nodes [v2] In-Reply-To: References: Message-ID: <3D-DKcq_dDWFZXapiQzKXidtnEWNM4bmTgKuG_Qz8Io=.4d48de6f-f7e8-4d11-b58f-abd28ccf197b@github.com> > Recently we found that the rotate left/right benchmarks with vectorapi > emit a redundant "and" instruction on both aarch64 and x86_64 machines > which can be done away with. For example - and(and(a, b), b) generates > two "and" instructions which can be reduced to a single "and" operation- > and(a, b) since "and" (and "or") operations are commutative and > idempotent in nature. This can help improve performance for all those > workloads which have multiple "and"/"or" operations with the same value > by reducing them to fewer "and"/"or" operations accordingly. > > This patch adds the following transformations for vector logical > operations - AndV and OrV : > > > (OpV (OpV a b) b) => (OpV a b) > (OpV (OpV a b) a) => (OpV a b) > (OpV (OpV a b m1) b m1) => (OpV a b m1) > (OpV (OpV a b m1) a m1) => (OpV a b m1) > (OpV a (OpV a b)) => (OpV a b) > (OpV b (OpV a b)) => (OpV a b) > (OpV a (OpV a b m) m) => (OpV a b m) > > where Op = "And", "Or" > > Links for benchmarks tested are given below :- > https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L728 > https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L764 > https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L728 > https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L764 > > Before this patch, the disassembly for one these testcases > (IntMaxVector.ROR) for Neon is shown below : > ``` > ldr q16, [x12, #16] > and v16.16b, v16.16b, v20.16b > and v16.16b, v16.16b, v20.16b > add x12, x16, x11 > sub v17.4s, v21.4s, v16.4s > ... > ... > > > After this patch, the disassembly for the same testcase above is shown > below : > > ldr q16, [x12, #16] > and v16.16b, v16.16b, v20.16b > add x12, x16, x11 > sub v17.4s, v21.4s, v16.4s > ... > ... > > > The other tests also emit an extra "and" instruction as shown above for > the vector ROR/ROL operations. > > Below are the performance results for the vectorapi rotate tests (tests > given in the links above) with this patch on aarch64 and x86_64 machines > (for int and long types) - > > > Benchmark aarch64 x86_64 > IntMaxVector.ROL 25.57% 26.09% > IntMaxVector.ROR 23.75% 24.15% > LongMaxVector.ROL 28.91% 28.51% > LongMaxVector.ROR 16.51% 29.11% > > > > The percentage indicates the percent gain/improvement in performance > (ops/ms) with this patch over the master build without this patch. The > machine descriptions are given below - > aarch64 - 128-bit aarch64 machine > x86_64 - 256-bit x86 machine Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Merge two if conditions and some trivial changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10163/files - new: https://git.openjdk.org/jdk/pull/10163/files/04f76be6..5e3a445f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10163&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10163&range=00-01 Stats: 11 lines in 1 file changed: 4 ins; 3 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10163.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10163/head:pull/10163 PR: https://git.openjdk.org/jdk/pull/10163 From bkilambi at openjdk.org Thu Sep 8 16:28:34 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 8 Sep 2022 16:28:34 GMT Subject: RFR: 8292675: Add identity transformation for removing redundant AndV/OrV nodes [v2] In-Reply-To: <2JiESg4ACY8as7JSVt5hw17nxO7yjQmJXuH0lM3PbdI=.2352df9b-256b-41e3-8046-6f969605199e@github.com> References: <2JiESg4ACY8as7JSVt5hw17nxO7yjQmJXuH0lM3PbdI=.2352df9b-256b-41e3-8046-6f969605199e@github.com> Message-ID: On Thu, 8 Sep 2022 11:59:14 GMT, Tobias Hartmann wrote: >> Hi, thank you for your review comments. I will make the suggested changes and upload a new patch. >> Regarding omitting this condition - `(OperationV src2 (OperationV src1 src2 m1) m1)` => This would result in a different result as compared to `(OperationV src1 src2 m1)`. So the inner `OperationV` would result in `src1` being copied for unmasked lanes and the outer `OperationV` would result in `src2` being copied to the final result for unmasked lanes and thus the results for both the operations is different. So for masked lanes, the result is `OperationV(src1, src2)` but it is not so for the unmasked lanes. So we need to have the first arguments of both `OperationV` to be same so that the unmasked lanes would then have the same value in the end and that can then be optimized to a single `OperationV` node. > > Right, I missed that. Would probably not hurt adding a corresponding comment. Thanks! Thanks. I have added a comment for this case. ------------- PR: https://git.openjdk.org/jdk/pull/10163 From vlivanov at openjdk.org Thu Sep 8 18:06:30 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 8 Sep 2022 18:06:30 GMT Subject: RFR: 8293044: C1: Missing access check on non-accessible class Message-ID: C1 erroneously omits some access checks on symbolically referenced classes. Proposed fix relies on code patching to throw proper resolution error when required. Also, to avoid repeated recompilations on platforms which don't support code patching, the nmethod is not marked as non-entrant when corresponding constant pool entry is in error state. Testing: hs-tier1 - hs-tier4 ------------- Commit messages: - 8293044: C1: Missing access check on non-accessible class Changes: https://git.openjdk.org/jdk/pull/10222/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10222&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293044 Stats: 251 lines in 8 files changed: 234 ins; 6 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/10222.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10222/head:pull/10222 PR: https://git.openjdk.org/jdk/pull/10222 From xgong at openjdk.org Fri Sep 9 01:25:43 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 9 Sep 2022 01:25:43 GMT Subject: RFR: 8292898: [vectorapi] Unify vector mask cast operation [v2] In-Reply-To: References: Message-ID: <9G7Uuoe3vIWiS3RRwbfyA-9ipXw5tPX_NJyWtGSxc28=.a27d04c7-7bb0-4c6f-bc82-af020af027be@github.com> On Wed, 7 Sep 2022 09:22:11 GMT, Xiaohong Gong wrote: >> The current implementation of the vector mask cast operation is >> complex that the compiler generates different patterns for different >> scenarios. For architectures that do not support the predicate >> feature, vector mask is represented the same as the normal vector. >> So the vector mask cast is implemented by `VectorCast `node. But this >> is not always needed. When two masks have the same element size (e.g. >> int vs. float), their bits layout are the same. So casting between >> them does not need to emit any instructions. >> >> Currently the compiler generates different patterns based on the >> vector type of the input/output and the platforms. Normally the >> "`VectorMaskCast`" op is only used for cases that doesn't emit any >> instructions, and "`VectorCast`" op is used to implement the necessary >> expand/narrow operations. This can avoid adding some duplicate rules >> in the backend. However, this also has the drawbacks: >> >> 1) The codes are complex, especially when the compiler needs to >> check whether the hardware supports the necessary IRs for the >> vector mask cast. It needs to check different patterns for >> different cases. >> 2) The vector mask cast operation could be implemented with cheaper >> instructions than the vector casting on some architectures. >> >> Instead of generating `VectorCast `or `VectorMaskCast `nodes for different >> cases of vector mask cast operations, this patch unifies the vector >> mask cast implementation with "`VectorMaskCast`" node for all vector types >> and platforms. The missing backend rules are also added for it. >> >> This patch also simplies the vector mask conversion happened in >> "`VectorUnbox::Ideal()`". Normally "`VectorUnbox (VectorBox vmask)`" can >> be optimized to "`vmask`" if the unboxing type matches with the boxed >> "`vmask`" type. Otherwise, it needs the type conversion. Currently the >> "`VectorUnbox`" will be transformed to two different patterns to implement >> the conversion: >> >> 1) If the element size is not changed, it is transformed to: >> >> "VectorMaskCast vmask" >> >> 2) Otherwise, it is transformed to: >> >> "VectorLoadMask (VectorStoreMask vmask)" >> >> It firstly converts the "`vmask`" to a boolean vector with "`VectorStoreMask`", >> and then uses "`VectorLoadMask`" to convert the boolean vector to the >> dst mask vector. Since this patch makes "`VectorMaskCast`" op supported >> for all types on all platforms, it doesn't need the "`VectorLoadMask`" and >> "`VectorStoreMask`" to do the conversion. The existing transformation: >> >> VectorUnbox (VectorBox vmask) => VectorLoadMask (VectorStoreMask vmask) >> >> can be simplified to: >> >> VectorUnbox (VectorBox vmask) => VectorMaskCast vmask > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8292898: [vectorapi] Unify vector mask cast operation Hi, could anyone please help to take a look at this PR? Thanks in advance! Hi @jatin-bhateja, @DamonFool , could you please help to take a look at this PR? Thanks a lot! ------------- PR: https://git.openjdk.org/jdk/pull/10192 From fgao at openjdk.org Fri Sep 9 01:31:52 2022 From: fgao at openjdk.org (Fei Gao) Date: Fri, 9 Sep 2022 01:31:52 GMT Subject: RFR: 8275275: AArch64: Fix performance regression after auto-vectorization on NEON [v2] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 06:58:07 GMT, Fei Gao wrote: >> For some vector opcodes, there are no corresponding AArch64 NEON >> instructions but supporting them benefits vector API. Some of >> this kind of opcodes are also used by superword for auto- >> vectorization and here is the list: >> >> VectorCastD2I, VectorCastL2F >> MulVL >> AddReductionVI/L/F/D >> MulReductionVI/L/F/D >> AndReductionV, OrReductionV, XorReductionV >> >> >> We did some micro-benchmark performance tests on NEON and found >> that some of listed opcodes hurt the performance of loops after >> auto-vectorization, but others don't. >> >> This patch disables those opcodes for superword, which have >> obvious performance regressions after auto-vectorization on >> NEON. Besides, one jtreg test case, where IR nodes are checked, >> is added in the patch to protect the code against change by >> mistake in the future. >> >> Here is the performance data before and after the patch on NEON. >> >> Benchmark length Mode Cnt Before After Units >> AddReductionVD 1024 thrpt 15 450.830 548.001 ops/ms >> AddReductionVF 1024 thrpt 15 514.468 548.013 ops/ms >> MulReductionVD 1024 thrpt 15 405.613 499.531 ops/ms >> MulReductionVF 1024 thrpt 15 451.292 495.061 ops/ms >> >> Note: >> Because superword doesn't vectorize reductions unconnected with >> other vector packs, the benchmark function for Add/Mul >> reduction is like: >> >> // private double[] da, db; >> // private double dresult; >> public void AddReductionVD() { >> double result = 1; >> for (int i = startIndex; i < length; i++) { >> result += (da[i] + db[i]); >> } >> dresult += result; >> } >> >> >> Specially, vector multiply long has been implemented but disabled >> for both vector API and superword. Out of the same reason, the >> patch re-enables MulVL on NEON for Vector API but still disables >> it for superword. The performance uplift on vector API is ~12.8x >> on my local. >> >> Benchmark length Mode Cnt Before After Units >> Long128Vector.MUL 1024 thrpt 10 55.015 760.593 ops/ms >> MulVL(superword) 1024 thrpt 10 907.788 907.805 ops/ms >> >> Note: >> The superword benchmark function is: >> >> // private long[] in1, in2, res; >> public void MulVL() { >> for (int i = 0; i < length; i++) { >> res[i] = in1[i] * in2[i]; >> } >> } >> >> The Vector API benchmark case is from: >> https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Long128Vector.java#L190 > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Fix match rules for mla/mls and add a vector API regression testcase > - Merge branch 'master' into fg8275275 > - 8275275: AArch64: Fix performance regression after auto-vectorization on NEON > > For some vector opcodes, there are no corresponding AArch64 NEON > instructions but supporting them benefits vector API. Some of > this kind of opcodes are also used by superword for auto- > vectorization and here is the list: > ``` > VectorCastD2I, VectorCastL2F > MulVL > AddReductionVI/L/F/D > MulReductionVI/L/F/D > AndReductionV, OrReductionV, XorReductionV > ``` > > We did some micro-benchmark performance tests on NEON and found > that some of listed opcodes hurt the performance of loops after > auto-vectorization, but others don't. > > This patch disables those opcodes for superword, which have > obvious performance regressions after auto-vectorization on > NEON. Besides, one jtreg test case, where IR nodes are checked, > is added in the patch to protect the code against change by > mistake in the future. > > Here is the performance data before and after the patch on NEON. > > Benchmark length Mode Cnt Before After Units > AddReductionVD 1024 thrpt 15 450.830 548.001 ops/ms > AddReductionVF 1024 thrpt 15 514.468 548.013 ops/ms > MulReductionVD 1024 thrpt 15 405.613 499.531 ops/ms > MulReductionVF 1024 thrpt 15 451.292 495.061 ops/ms > > Note: > Because superword doesn't vectorize reductions unconnected with > other vector packs, the benchmark function for Add/Mul > reduction is like: > ``` > // private double[] da, db; > // private double dresult; > public void AddReductionVD() { > double result = 1; > for (int i = startIndex; i < length; i++) { > result += (da[i] + db[i]); > } > dresult += result; > } > ``` > > Specially, vector multiply long has been implemented but disabled > for both vector API and superword. Out of the same reason, the > patch re-enables MulVL on NEON for Vector API but still disables > it for superword. The performance uplift on vector API is ~12.8x > on my local. > > Benchmark length Mode Cnt Before After Units > Long128Vector.MUL 1024 thrpt 10 55.015 760.593 ops/ms > MulVL(superword) 1024 thrpt 10 907.788 907.805 ops/ms > > Note: > The superword benchmark function is: > ``` > // private long[] in1, in2, res; > public void MulVL() { > for (int i = 0; i < length; i++) { > res[i] = in1[i] * in2[i]; > } > } > > The Vector API benchmark case is from: > https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Long128Vector.java#L190 > > ``` > > Change-Id: Ie9133e4010f98b26f97969c02fbf992b11e7edbb The patch involves aarch64 only, so I suppose the GHA failure is not caused by this PR. ------------- PR: https://git.openjdk.org/jdk/pull/10175 From dlong at openjdk.org Fri Sep 9 01:33:02 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 9 Sep 2022 01:33:02 GMT Subject: RFR: 8293287 add ReplayReduce flag [v4] In-Reply-To: References: Message-ID: > Add an experimental flag to help developers "reduce" a replay file. > > As a first step, I plan to simulate reduced inlining. This will output multiple "compile" lines as if the first level of inlining never happened: > A --> B --> C > A --> D --> E > becomes > B --> C > D --> E > Developers can repeat iteratively until the replay crash no longer reproduces. Dean Long has updated the pull request incrementally with one additional commit since the last revision: fix typo Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10134/files - new: https://git.openjdk.org/jdk/pull/10134/files/4ba53b45..d0340273 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10134&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10134&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10134.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10134/head:pull/10134 PR: https://git.openjdk.org/jdk/pull/10134 From xgong at openjdk.org Fri Sep 9 01:33:55 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 9 Sep 2022 01:33:55 GMT Subject: RFR: 8290169: adlc: Improve child constraints for vector unary operations [v2] In-Reply-To: References: Message-ID: On Thu, 18 Aug 2022 03:27:55 GMT, Hao Sun wrote: >> As demonstrated in [1], the child constrait generated for *predicated >> vector unary operation* is the super set of that generated for the >> *unpredicated* version. As a result, there exists a risk for predicated >> vector unary operaions to match the unpredicated rules by accident. >> >> In this patch, we resolve this issue by generating one extra check >> "rChild == NULL" ONLY for vector unary operations. In this way, the >> child constraints for predicated/unpredicated vector unary operations >> are exclusive now. >> >> Following the example in [1], the dfa state generated for AbsVI is shown >> below. >> >> >> void State::_sub_Op_AbsVI(const Node *n){ >> if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && >> ( UseSVE > 0 ) ) >> { >> unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; >> DFA_PRODUCTION(VREG, vabsI_masked_rule, c) >> } >> if( STATE__VALID_CHILD(_kids[0], VREG) && _kids[1] == NULL && <---- 1 >> ( UseSVE > 0) ) >> { >> unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; >> if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { >> DFA_PRODUCTION(VREG, vabsI_rule, c) >> } >> } >> ... >> >> >> We can see that the constraint at line 1 cannot be matched for >> predicated AbsVI node now. >> >> The main updates are made in adlc/dfa part. Ideally, we should only >> add the extra check for affected platforms, i.e. AVX-512 and SVE. But we >> didn't do that because it would be better not to introduce any >> architecture dependent implementation here. >> >> Besides, workarounds in both ~aarch64_sve.ad~aarch64_vector.ad and x86.ad are removed. 1) >> Many "is_predicated_vector()" checks can be removed in ~aarch64_sve.ad~aarch64_vector.ad >> file. 2) Default instruction cost is used for involving rules in x86.ad >> file. >> >> ~[1]. https://github.com/shqking/jdk/commit/50ec9b19~ >> [1]. https://github.com/shqking/jdk/commit/f7d9621e2 > > Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into 8290169-adlc > > Resolve the conflicts. > - 8290169: adlc: Improve child constraints for vector unary operations > > As demonstrated in [1], the child constrait generated for *predicated > vector unary operation* is the super set of that generated for the > *unpredicated* version. As a result, there exists a risk for predicated > vector unary operaions to match the unpredicated rules by accident. > > In this patch, we resolve this issue by generating one extra check > "rChild == NULL" ONLY for vector unary operations. In this way, the > child constraints for predicated/unpredicated vector unary operations > are exclusive now. > > Following the example in [1], the dfa state generated for AbsVI is shown > below. > > ``` > void State::_sub_Op_AbsVI(const Node *n){ > if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && > ( UseSVE > 0 ) ) > { > unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; > DFA_PRODUCTION(VREG, vabsI_masked_rule, c) > } > if( STATE__VALID_CHILD(_kids[0], VREG) && _kids[1] == NULL && <---- 1 > ( UseSVE > 0) ) > { > unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; > if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { > DFA_PRODUCTION(VREG, vabsI_rule, c) > } > } > ... > ``` > > We can see that the constraint at line 1 cannot be matched for > predicated AbsVI node now. > > The main updates are made in adlc/dfa part. Ideally, we should only > add the extra check for affected platforms, i.e. AVX-512 and SVE. But we > didn't do that because it would be better not to introduce any > architecture dependent implementation here. > > Besides, workarounds in both aarch64_sve.ad and x86.ad are removed. 1) > Many "is_predicated_vector()" checks can be removed in aarch64_sve.ad > file. 2) Default instruction cost is used for involving rules in x86.ad > file. > > [1]. https://github.com/shqking/jdk/commit/50ec9b19 AArch64 part looks good to me. Thanks! ------------- Marked as reviewed by xgong (Committer). PR: https://git.openjdk.org/jdk/pull/9534 From dlong at openjdk.org Fri Sep 9 01:35:27 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 9 Sep 2022 01:35:27 GMT Subject: RFR: 8293287 add ReplayReduce flag [v3] In-Reply-To: References: Message-ID: <_7XcIch4uVJEa9WB_lUm3Yc3gJPQEAmQ6FNMwTVLHKo=.2b5aa570-199f-4cf8-8272-51bd4f7326f7@github.com> On Thu, 8 Sep 2022 07:46:23 GMT, Tobias Hartmann wrote: > I'm wondering if this functionality should really be part of the VM. Wouldn't a simple script that regex-parses the compile statement of the replay file, iteratively removes inlines and runs replay compilation to check if the issue still reproduces, be more powerful and easier to maintain? It could be combined with also removing class loading statements. Writing a script was my first attempt, but getting the parsing right would have taken more time than I wanted to invest, so I did what was quickest and easiest. I found it useful, so I thought checking it in might be useful to others, considering the code changes are small. One problem with an external script is that replay won't normally create a new replay file if -XX:+ReplayCompiles is on. So, some kind of JVM change is required. ------------- PR: https://git.openjdk.org/jdk/pull/10134 From xgong at openjdk.org Fri Sep 9 01:55:25 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 9 Sep 2022 01:55:25 GMT Subject: RFR: 8292898: [vectorapi] Unify vector mask cast operation [v2] In-Reply-To: References: Message-ID: On Wed, 7 Sep 2022 09:22:11 GMT, Xiaohong Gong wrote: >> The current implementation of the vector mask cast operation is >> complex that the compiler generates different patterns for different >> scenarios. For architectures that do not support the predicate >> feature, vector mask is represented the same as the normal vector. >> So the vector mask cast is implemented by `VectorCast `node. But this >> is not always needed. When two masks have the same element size (e.g. >> int vs. float), their bits layout are the same. So casting between >> them does not need to emit any instructions. >> >> Currently the compiler generates different patterns based on the >> vector type of the input/output and the platforms. Normally the >> "`VectorMaskCast`" op is only used for cases that doesn't emit any >> instructions, and "`VectorCast`" op is used to implement the necessary >> expand/narrow operations. This can avoid adding some duplicate rules >> in the backend. However, this also has the drawbacks: >> >> 1) The codes are complex, especially when the compiler needs to >> check whether the hardware supports the necessary IRs for the >> vector mask cast. It needs to check different patterns for >> different cases. >> 2) The vector mask cast operation could be implemented with cheaper >> instructions than the vector casting on some architectures. >> >> Instead of generating `VectorCast `or `VectorMaskCast `nodes for different >> cases of vector mask cast operations, this patch unifies the vector >> mask cast implementation with "`VectorMaskCast`" node for all vector types >> and platforms. The missing backend rules are also added for it. >> >> This patch also simplies the vector mask conversion happened in >> "`VectorUnbox::Ideal()`". Normally "`VectorUnbox (VectorBox vmask)`" can >> be optimized to "`vmask`" if the unboxing type matches with the boxed >> "`vmask`" type. Otherwise, it needs the type conversion. Currently the >> "`VectorUnbox`" will be transformed to two different patterns to implement >> the conversion: >> >> 1) If the element size is not changed, it is transformed to: >> >> "VectorMaskCast vmask" >> >> 2) Otherwise, it is transformed to: >> >> "VectorLoadMask (VectorStoreMask vmask)" >> >> It firstly converts the "`vmask`" to a boolean vector with "`VectorStoreMask`", >> and then uses "`VectorLoadMask`" to convert the boolean vector to the >> dst mask vector. Since this patch makes "`VectorMaskCast`" op supported >> for all types on all platforms, it doesn't need the "`VectorLoadMask`" and >> "`VectorStoreMask`" to do the conversion. The existing transformation: >> >> VectorUnbox (VectorBox vmask) => VectorLoadMask (VectorStoreMask vmask) >> >> can be simplified to: >> >> VectorUnbox (VectorBox vmask) => VectorMaskCast vmask > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8292898: [vectorapi] Unify vector mask cast operation Hi @sviswa7, could you please help to take a look at the x86 codegen part? Thanks so much! ------------- PR: https://git.openjdk.org/jdk/pull/10192 From pli at openjdk.org Fri Sep 9 03:33:45 2022 From: pli at openjdk.org (Pengfei Li) Date: Fri, 9 Sep 2022 03:33:45 GMT Subject: RFR: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv [v3] In-Reply-To: References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> Message-ID: On Thu, 8 Sep 2022 11:49:58 GMT, Tobias Hartmann wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Update p_short_scale compuation > > src/hotspot/share/opto/loopTransform.cpp line 2776: > >> 2774: // This logic is shared by int and long. For int, the result may overflow >> 2775: // as we use jlong to compute so do the check here. Long result may also >> 2776: // overflow but that's fine because result wraps. > > But doesn't this mean that we bail out for integer overflows while not bailing out for long overflows? Yes, it does. If this inconsistency doesn't look good, I could also try adding long overflow checks just like what we have in utility function `bool add_overflows(T x, T y)`. ------------- PR: https://git.openjdk.org/jdk/pull/9851 From thartmann at openjdk.org Fri Sep 9 05:50:43 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 9 Sep 2022 05:50:43 GMT Subject: RFR: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv [v3] In-Reply-To: References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> Message-ID: On Fri, 9 Sep 2022 03:31:31 GMT, Pengfei Li wrote: >> src/hotspot/share/opto/loopTransform.cpp line 2776: >> >>> 2774: // This logic is shared by int and long. For int, the result may overflow >>> 2775: // as we use jlong to compute so do the check here. Long result may also >>> 2776: // overflow but that's fine because result wraps. >> >> But doesn't this mean that we bail out for integer overflows while not bailing out for long overflows? > > Yes, it does. If this inconsistency doesn't look good, I could also try adding long overflow checks just like what we have in utility function `bool add_overflows(T x, T y)`. I'm just wondering if there's a good reason for bailing out for integer overflows and if the same applies to long overflows. @rwestrel, you added that check with JDK-8278296, do you remember why? ------------- PR: https://git.openjdk.org/jdk/pull/9851 From thartmann at openjdk.org Fri Sep 9 08:02:43 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 9 Sep 2022 08:02:43 GMT Subject: RFR: 8275275: AArch64: Fix performance regression after auto-vectorization on NEON [v2] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 06:58:07 GMT, Fei Gao wrote: >> For some vector opcodes, there are no corresponding AArch64 NEON >> instructions but supporting them benefits vector API. Some of >> this kind of opcodes are also used by superword for auto- >> vectorization and here is the list: >> >> VectorCastD2I, VectorCastL2F >> MulVL >> AddReductionVI/L/F/D >> MulReductionVI/L/F/D >> AndReductionV, OrReductionV, XorReductionV >> >> >> We did some micro-benchmark performance tests on NEON and found >> that some of listed opcodes hurt the performance of loops after >> auto-vectorization, but others don't. >> >> This patch disables those opcodes for superword, which have >> obvious performance regressions after auto-vectorization on >> NEON. Besides, one jtreg test case, where IR nodes are checked, >> is added in the patch to protect the code against change by >> mistake in the future. >> >> Here is the performance data before and after the patch on NEON. >> >> Benchmark length Mode Cnt Before After Units >> AddReductionVD 1024 thrpt 15 450.830 548.001 ops/ms >> AddReductionVF 1024 thrpt 15 514.468 548.013 ops/ms >> MulReductionVD 1024 thrpt 15 405.613 499.531 ops/ms >> MulReductionVF 1024 thrpt 15 451.292 495.061 ops/ms >> >> Note: >> Because superword doesn't vectorize reductions unconnected with >> other vector packs, the benchmark function for Add/Mul >> reduction is like: >> >> // private double[] da, db; >> // private double dresult; >> public void AddReductionVD() { >> double result = 1; >> for (int i = startIndex; i < length; i++) { >> result += (da[i] + db[i]); >> } >> dresult += result; >> } >> >> >> Specially, vector multiply long has been implemented but disabled >> for both vector API and superword. Out of the same reason, the >> patch re-enables MulVL on NEON for Vector API but still disables >> it for superword. The performance uplift on vector API is ~12.8x >> on my local. >> >> Benchmark length Mode Cnt Before After Units >> Long128Vector.MUL 1024 thrpt 10 55.015 760.593 ops/ms >> MulVL(superword) 1024 thrpt 10 907.788 907.805 ops/ms >> >> Note: >> The superword benchmark function is: >> >> // private long[] in1, in2, res; >> public void MulVL() { >> for (int i = 0; i < length; i++) { >> res[i] = in1[i] * in2[i]; >> } >> } >> >> The Vector API benchmark case is from: >> https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Long128Vector.java#L190 > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Fix match rules for mla/mls and add a vector API regression testcase > - Merge branch 'master' into fg8275275 > - 8275275: AArch64: Fix performance regression after auto-vectorization on NEON > > For some vector opcodes, there are no corresponding AArch64 NEON > instructions but supporting them benefits vector API. Some of > this kind of opcodes are also used by superword for auto- > vectorization and here is the list: > ``` > VectorCastD2I, VectorCastL2F > MulVL > AddReductionVI/L/F/D > MulReductionVI/L/F/D > AndReductionV, OrReductionV, XorReductionV > ``` > > We did some micro-benchmark performance tests on NEON and found > that some of listed opcodes hurt the performance of loops after > auto-vectorization, but others don't. > > This patch disables those opcodes for superword, which have > obvious performance regressions after auto-vectorization on > NEON. Besides, one jtreg test case, where IR nodes are checked, > is added in the patch to protect the code against change by > mistake in the future. > > Here is the performance data before and after the patch on NEON. > > Benchmark length Mode Cnt Before After Units > AddReductionVD 1024 thrpt 15 450.830 548.001 ops/ms > AddReductionVF 1024 thrpt 15 514.468 548.013 ops/ms > MulReductionVD 1024 thrpt 15 405.613 499.531 ops/ms > MulReductionVF 1024 thrpt 15 451.292 495.061 ops/ms > > Note: > Because superword doesn't vectorize reductions unconnected with > other vector packs, the benchmark function for Add/Mul > reduction is like: > ``` > // private double[] da, db; > // private double dresult; > public void AddReductionVD() { > double result = 1; > for (int i = startIndex; i < length; i++) { > result += (da[i] + db[i]); > } > dresult += result; > } > ``` > > Specially, vector multiply long has been implemented but disabled > for both vector API and superword. Out of the same reason, the > patch re-enables MulVL on NEON for Vector API but still disables > it for superword. The performance uplift on vector API is ~12.8x > on my local. > > Benchmark length Mode Cnt Before After Units > Long128Vector.MUL 1024 thrpt 10 55.015 760.593 ops/ms > MulVL(superword) 1024 thrpt 10 907.788 907.805 ops/ms > > Note: > The superword benchmark function is: > ``` > // private long[] in1, in2, res; > public void MulVL() { > for (int i = 0; i < length; i++) { > res[i] = in1[i] * in2[i]; > } > } > > The Vector API benchmark case is from: > https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Long128Vector.java#L190 > > ``` > > Change-Id: Ie9133e4010f98b26f97969c02fbf992b11e7edbb I tested this in our CI. All tests passed. ------------- PR: https://git.openjdk.org/jdk/pull/10175 From thartmann at openjdk.org Fri Sep 9 08:03:47 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 9 Sep 2022 08:03:47 GMT Subject: RFR: 8292675: Add identity transformation for removing redundant AndV/OrV nodes [v2] In-Reply-To: <3D-DKcq_dDWFZXapiQzKXidtnEWNM4bmTgKuG_Qz8Io=.4d48de6f-f7e8-4d11-b58f-abd28ccf197b@github.com> References: <3D-DKcq_dDWFZXapiQzKXidtnEWNM4bmTgKuG_Qz8Io=.4d48de6f-f7e8-4d11-b58f-abd28ccf197b@github.com> Message-ID: On Thu, 8 Sep 2022 16:28:32 GMT, Bhavana Kilambi wrote: >> Recently we found that the rotate left/right benchmarks with vectorapi >> emit a redundant "and" instruction on both aarch64 and x86_64 machines >> which can be done away with. For example - and(and(a, b), b) generates >> two "and" instructions which can be reduced to a single "and" operation- >> and(a, b) since "and" (and "or") operations are commutative and >> idempotent in nature. This can help improve performance for all those >> workloads which have multiple "and"/"or" operations with the same value >> by reducing them to fewer "and"/"or" operations accordingly. >> >> This patch adds the following transformations for vector logical >> operations - AndV and OrV : >> >> >> (OpV (OpV a b) b) => (OpV a b) >> (OpV (OpV a b) a) => (OpV a b) >> (OpV (OpV a b m1) b m1) => (OpV a b m1) >> (OpV (OpV a b m1) a m1) => (OpV a b m1) >> (OpV a (OpV a b)) => (OpV a b) >> (OpV b (OpV a b)) => (OpV a b) >> (OpV a (OpV a b m) m) => (OpV a b m) >> >> where Op = "And", "Or" >> >> Links for benchmarks tested are given below :- >> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L728 >> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L764 >> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L728 >> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L764 >> >> Before this patch, the disassembly for one these testcases >> (IntMaxVector.ROR) for Neon is shown below : >> ``` >> ldr q16, [x12, #16] >> and v16.16b, v16.16b, v20.16b >> and v16.16b, v16.16b, v20.16b >> add x12, x16, x11 >> sub v17.4s, v21.4s, v16.4s >> ... >> ... >> >> >> After this patch, the disassembly for the same testcase above is shown >> below : >> >> ldr q16, [x12, #16] >> and v16.16b, v16.16b, v20.16b >> add x12, x16, x11 >> sub v17.4s, v21.4s, v16.4s >> ... >> ... >> >> >> The other tests also emit an extra "and" instruction as shown above for >> the vector ROR/ROL operations. >> >> Below are the performance results for the vectorapi rotate tests (tests >> given in the links above) with this patch on aarch64 and x86_64 machines >> (for int and long types) - >> >> >> Benchmark aarch64 x86_64 >> IntMaxVector.ROL 25.57% 26.09% >> IntMaxVector.ROR 23.75% 24.15% >> LongMaxVector.ROL 28.91% 28.51% >> LongMaxVector.ROR 16.51% 29.11% >> >> >> >> The percentage indicates the percent gain/improvement in performance >> (ops/ms) with this patch over the master build without this patch. The >> machine descriptions are given below - >> aarch64 - 128-bit aarch64 machine >> x86_64 - 256-bit x86 machine > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Merge two if conditions and some trivial changes Looks good to me. Testing in our CI passed. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/10163 From thartmann at openjdk.org Fri Sep 9 09:34:41 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 9 Sep 2022 09:34:41 GMT Subject: RFR: 8293287 add ReplayReduce flag [v4] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 01:33:02 GMT, Dean Long wrote: >> Add an experimental flag to help developers "reduce" a replay file. >> >> As a first step, I plan to simulate reduced inlining. This will output multiple "compile" lines as if the first level of inlining never happened: >> A --> B --> C >> A --> D --> E >> becomes >> B --> C >> D --> E >> Developers can repeat iteratively until the replay crash no longer reproduces. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > fix typo > > Co-authored-by: Tobias Hartmann Okay, I'm fine with adding this functionality, if it's been helpful to you. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/10134 From thartmann at openjdk.org Fri Sep 9 09:54:03 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 9 Sep 2022 09:54:03 GMT Subject: RFR: 8293044: C1: Missing access check on non-accessible class In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 17:12:36 GMT, Vladimir Ivanov wrote: > C1 erroneously omits some access checks on symbolically referenced classes. > > Proposed fix relies on code patching to throw proper resolution error when required. > > Also, to avoid repeated recompilations on platforms which don't support code > patching, the nmethod is not marked as non-entrant when corresponding constant > pool entry is in error state. > > Testing: hs-tier1 - hs-tier4 Looks good to me otherwise. src/hotspot/share/ci/ciStreams.cpp line 194: > 192: return CURRENT_ENV->get_klass_by_index(cpool, get_klass_index(), will_link, _holder); > 193: } > 194: // ciBytecodeStream::get_klass Suggestion: } // ciBytecodeStream::get_klass test/hotspot/jtreg/compiler/c1/KlassAccessCheckTest.java line 31: > 29: * @compile KlassAccessCheck.jasm > 30: * @run main/othervm -Xbatch -XX:TieredStopAtLevel=1 > 31: * -XX:+PrintCompilation -XX:CompileCommand=dontinline,KlassAccessCheck.test* I think the CompileCommand misses the package name. It should be `-XX:CompileCommand=dontinline,compiler.c1.KlassAccessCheck.test*` Also, you may want to remove the `-XX:+PrintCompilation`. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/10222 From thartmann at openjdk.org Fri Sep 9 09:57:51 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 9 Sep 2022 09:57:51 GMT Subject: RFR: 8292587: AArch64: Support SVE fabd instruction [v2] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 02:50:52 GMT, Hao Sun wrote: >> Scalar and NEON fabd instructions were initially supported in >> JDK-8256318. In this patch, we support SVE fabd instruction [1] and add >> one Jtreg test case as well. >> >> With this patch, two instructions `fsub + fabs` would be combined into >> one single `fabd` instruction. >> >> >> fsub z16.s, z16.s, z17.s >> fabs z16.s, p7/m, z16.s >> >> --> >> >> fabd z16.s, p7/m, z16.s, z17.s >> >> >> In the initial evaluation of JMH case, i.e. >> FloatingScalarVectorAbsDiff.java, we found the performance uplift done >> by this optimization was easily hidden by the heavy memory load/store >> instructions. To avoid that, we updated the JMH case a bit, adding one >> more group of subtraction and Math.abs operations in the loop body. >> >> Here shows the data with the new JMH case on one 256-bit SVE machine. We >> can observe about 39% and 35% improvements for the two functions >> respectively. >> >> >> Benchmark Before After Units >> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 260.468 160.965 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 133.963 87.292 ns/op >> >> >> Jtreg testing: tier1~3 passed on one NEON-only machine and one 256-bit SVE machine. >> >> [1] https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FABD--Floating-point-absolute-difference--predicated-- > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > Update the loop limit in VectorAbsDiffTest.java > > As pointed out by Faye Gao, the test results are not fully verified due > to incorrect loop limits. > > Updated it. > > Reran the test and no regression. I tested this in our CI. All tests passed. ------------- PR: https://git.openjdk.org/jdk/pull/10011 From thartmann at openjdk.org Fri Sep 9 10:12:53 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 9 Sep 2022 10:12:53 GMT Subject: RFR: 8292761: x86: Clone nodes to match complex rules [v2] In-Reply-To: References: Message-ID: On Tue, 23 Aug 2022 11:07:32 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch tries to clone a node if it can be matched as a part of a BMI and lea pattern. This may reduce the live range of a local or remove that local completely. >> >> Please take a look and have some reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Merge branch 'master' into cloneSimpleNodes > - fix > - Merge branch 'master' into cloneSimpleNodes > - shorten > - improve checks > - lea patterns > - refactor > - lea patterns > - first commit Please include the benchmark in the patch. Could you show the generated code before/after? Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/9977 From thartmann at openjdk.org Fri Sep 9 10:50:42 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 9 Sep 2022 10:50:42 GMT Subject: RFR: 8289422: Fix and re-enable vector conditional move [v4] In-Reply-To: References: <6uthI29shZjAeLK-eV3Kxqao06qoa9U9zQ5g_oDLmkI=.3e171aae-2003-46c9-88ac-9a63fecc5d96@github.com> Message-ID: On Tue, 6 Sep 2022 02:47:38 GMT, Fei Gao wrote: >> // float[] a, float[] b, float[] c; >> for (int i = 0; i < a.length; i++) { >> c[i] = (a[i] > b[i]) ? a[i] : b[i]; >> } >> >> >> After [JDK-8139340](https://bugs.openjdk.org/browse/JDK-8139340) and [JDK-8192846](https://bugs.openjdk.org/browse/JDK-8192846), we hope to vectorize the case >> above by enabling -XX:+UseCMoveUnconditionally and -XX:+UseVectorCmov. >> But the transformation here[1] is going to optimize the BoolNode >> with constant input to a constant and break the design logic of >> cmove vector node[2]. We can't prevent all GVN transformation to >> the BoolNode before matcher, so the patch keeps the condition input >> as a constant while creating a cmove vector node, and then >> restructures it into a binary tree before matching. >> >> When the input order of original cmp node is different from the >> input order of original cmove node, like: >> >> // float[] a, float[] b, float[] c; >> for (int i = 0; i < a.length; i++) { >> c[i] = (a[i] < b[i]) ? a[i] : b[i]; >> } >> >> the patch negates the mask of the BoolNode before creating the >> cmove vector node in SuperWord::output(). >> >> We can also use VectorNode::implemented() to consult if vector >> conditional move is supported in the backend. So, the patch cleans >> the related code in SuperWord::implemented(). >> >> With the patch, the performance uplift is: >> (The micro-benchmark functions are included in the file >> test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java) >> >> AArch64: >> Benchmark (length) Mode Cnt uplift(ns/op) >> cmoveD 523 avgt 15 68.89% >> cmoveF 523 avgt 15 72.40% >> >> X86: >> Benchmark (length) Mode Cnt uplift(ns/op) >> cmoveD 523 avgt 15 73.12% >> cmoveF 523 avgt 15 85.45% >> >> [1]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/subnode.cpp#L1310 >> [2]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/matcher.cpp#L2365 > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Rebase the patch to the latest JDK and add some testcase for NE and EQ > > Change-Id: Ifb02b5efc2a09e6e0b4fc1c8346698597464f448 > - Merge branch 'master' into fg8289422 > > Change-Id: I09677cb07f6b2717aa768a830663ca455806b900 > - Merge branch 'master' into fg8289422 > > Change-Id: I870c7bbc73d12bac16756226125edc1a229ba412 > - Enable the test only on aarch64 platform because X86 supports vector cmove only on some 256-bits AVXs > > Change-Id: I64dd49380fe3d303ef6be21460df3be31c1458f8 > - Merge branch 'master' into fg8289422 > > Change-Id: I7936552df6ac12949ed8b550576f4e3520596423 > - 8289422: Fix and re-enable vector conditional move > > ``` > // float[] a, float[] b, float[] c; > for (int i = 0; i < a.length; i++) { > c[i] = (a[i] > b[i]) ? a[i] : b[i]; > } > ``` > > After JDK-8139340 and JDK-8192846, we hope to vectorize the case > above by enabling -XX:+UseCMoveUnconditionally and -XX:+UseVectorCmov. > But the transformation here[1] is going to optimize the BoolNode > with constant input to a constant and break the design logic of > cmove vector node[2]. We can't prevent all GVN transformation to > the BoolNode before matcher, so the patch keeps the condition input > as a constant while creating a cmove vector node, and then > restructures it into a binary tree before matching. > > When the input order of original cmp node is different from the > input order of original cmove node, like: > ``` > // float[] a, float[] b, float[] c; > for (int i = 0; i < a.length; i++) { > c[i] = (a[i] < b[i]) ? a[i] : b[i]; > } > ``` > the patch negates the mask of the BoolNode before creating the > cmove vector node in SuperWord::output(). > > We can also use VectorNode::implemented() to consult if vector > conditional move is supported in the backend. So, the patch cleans > the related code in SuperWord::implemented(). > > With the patch, the performance uplift is: > (The micro-benchmark functions are included in the file > test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java) > > AArch64: > Benchmark (length) Mode Cnt uplift(ns/op) > cmoveD 523 avgt 15 68.89% > cmoveF 523 avgt 15 72.40% > > X86: > Benchmark (length) Mode Cnt uplift(ns/op) > cmoveD 523 avgt 15 73.12% > cmoveF 523 avgt 15 85.45% > > [1]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/subnode.cpp#L1310 > [2]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/matcher.cpp#L2365 > > Change-Id: If046dd745024deb0e602bf7efc2a07c22b89c690 Thanks, I can see failures with the following tests when running with `-XX:+UseCMoveUnconditionally -XX:+UseVectorCmov`: - `compiler/c2/TestCondAddDeadBranch.java` - `compiler/loopopts/TestCastFFAtPhi.java` Error mixing types: vectory[4]:{double_top} and double_top # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (workspace/open/src/hotspot/share/opto/type.cpp:1179), pid=3589333, tid=3589359 # Error: ShouldNotReachHere() # # JRE version: Java(TM) SE Runtime Environment (20.0) (fastdebug build 20-internal-2022-09-09-0957028.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 20-internal-2022-09-09-0957028.tobias.hartmann.jdk2, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x1a95869] Type::typerr(Type const*) const+0x79 Current CompileTask: C2: 130 10 b TestCastFFAtPhi::init (35 bytes) Stack: [0x00007ff917726000,0x00007ff917827000], sp=0x00007ff917821540, free space=1005k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1a95869] Type::typerr(Type const*) const+0x79 (type.cpp:1179) V [libjvm.so+0x1a97f2b] TypeVect::xmeet(Type const*) const+0x1eb (type.cpp:2451) V [libjvm.so+0x1a9d203] Type::meet_helper(Type const*, bool) const+0x73 (type.cpp:879) V [libjvm.so+0x1a9d41a] Type::filter_helper(Type const*, bool) const+0x1a (type.hpp:188) V [libjvm.so+0x1793690] PhaseIterGVN::transform_old(Node*)+0x230 (phaseX.cpp:1294) V [libjvm.so+0x178b30e] PhaseIterGVN::optimize()+0x6e (phaseX.cpp:1203) V [libjvm.so+0xafeefa] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x6da (loopnode.hpp:1169) V [libjvm.so+0xafb253] Compile::Optimize()+0xe53 (compile.cpp:2171) V [libjvm.so+0xafd50d] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x15ad (compile.cpp:823) V [libjvm.so+0x90e2e5] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x675 (c2compiler.cpp:113) V [libjvm.so+0xb0ba5c] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xb1c (compileBroker.cpp:2243) V [libjvm.so+0xb0c828] CompileBroker::compiler_thread_loop()+0x5a8 (compileBroker.cpp:1917) V [libjvm.so+0x106c1dc] JavaThread::thread_main_inner()+0x22c (javaThread.cpp:700) V [libjvm.so+0x1a6dd10] Thread::call_run()+0x100 (thread.cpp:224) V [libjvm.so+0x1708f13] thread_native_entry(Thread*)+0x103 (os_linux.cpp:710) They also happen without this patch. Should we file a separate bug or are these supposed to be fixed by this change? ------------- PR: https://git.openjdk.org/jdk/pull/9652 From duke at openjdk.org Fri Sep 9 11:18:50 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Fri, 9 Sep 2022 11:18:50 GMT Subject: RFR: 8292761: x86: Clone nodes to match complex rules [v3] In-Reply-To: References: Message-ID: > Hi, > > This patch tries to clone a node if it can be matched as a part of a BMI and lea pattern. This may reduce the live range of a local or remove that local completely. > > Please take a look and have some reviews. Thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - add benchmark - Merge branch 'master' into cloneSimpleNodes - Merge branch 'master' into cloneSimpleNodes - fix - Merge branch 'master' into cloneSimpleNodes - shorten - improve checks - lea patterns - refactor - lea patterns - ... and 1 more: https://git.openjdk.org/jdk/compare/41741e28...0beae979 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9977/files - new: https://git.openjdk.org/jdk/pull/9977/files/7df6ed7e..0beae979 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9977&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9977&range=01-02 Stats: 47789 lines in 1029 files changed: 18161 ins; 20540 del; 9088 mod Patch: https://git.openjdk.org/jdk/pull/9977.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9977/head:pull/9977 PR: https://git.openjdk.org/jdk/pull/9977 From duke at openjdk.org Fri Sep 9 11:18:55 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Fri, 9 Sep 2022 11:18:55 GMT Subject: RFR: 8292761: x86: Clone nodes to match complex rules [v2] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 10:09:17 GMT, Tobias Hartmann wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: >> >> - Merge branch 'master' into cloneSimpleNodes >> - fix >> - Merge branch 'master' into cloneSimpleNodes >> - shorten >> - improve checks >> - lea patterns >> - refactor >> - lea patterns >> - first commit > > Please include the benchmark in the patch. Could you show the generated code before/after? Thanks! @TobiHartmann Thanks a lot for your review, I have added the benchmark to the patch, the generated code is as follow: CloneNodes::testAndn: Baseline: movl $0xffffffff, %r8d xorl 0x20(%rbx,%r11,4), %r8d movl %r8d, %r9d andl %edi, %r9d andl %ecx, %r8d Patched: movl 0x10(%rdi,%r11,4), %r10d andnl %ebx, %r10d, %r9d andnl %ecx, %r10d, %r8d CloneNodes::testLea: Baseline: movl 0x14(%rdi,%r11,4), %r8d shll $0x2, %r8d movl %r8d, %r9d addl $0x7, %r9d addl $0x3, %r8d Patched: movl 0x10(%rcx,%r11,4), %r10d leal 0x3(,%r10,4), %r9d leal 0x7(,%r10,4), %r10d ------------- PR: https://git.openjdk.org/jdk/pull/9977 From roland at openjdk.org Fri Sep 9 11:47:59 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 Sep 2022 11:47:59 GMT Subject: RFR: 8291599: Assertion in PhaseIdealLoop::skeleton_predicate_has_opaque after JDK-8289127 [v2] In-Reply-To: References: Message-ID: > In TestPhiInSkeletonPredicateExpression.test1(): > > - Loop predication adds predicates for the null check of array and the > array range check. It also adds skeleton predicates in case of > subsequent unrolling. > > - One of the skeleton predicate has the following shape: > > (Opaque4 (Bool (CmpUL (AddL (AddL (ConvI2L (LoadI (Phi ...))) (ConvI2L (CastII (AddI (OpaqueLoopInit OpaqueLoopStride))))) -1) ...))) > > - Split thru phi pushes the null check through the dominating > region. The skeleton predicate subgraph is transformed to: > > (Opaque4 (Bool (CmpUL (Phi ...) ...))) > > - Logic that processes skeleton predicate can no longer find the > OpaqueLoopInit and OpaqueLoopStride nodes because they are now > behind a phi. That causes the assert to fire. > > The fix I propose is to catch cases where part of a skeleton predicate > expression (a subgraph with a OpaqueLoopInit or OpaqueLoopStride node) > is being split during split if and to clone the entire skeleton > predicate subgraph then. > > There's a already logic for that currently but it only triggers if > PhaseIdealLoop::split_up() tries to split an OpaqueLoopInit or > OpaqueLoopStride. In the case here, the OpaqueLoopInit and > OpaqueLoopStride nodes have control above the region at which split if > occurs. So they are never split by PhaseIdealLoop::split_up(). The > AddL nodes in subgraph are. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/loopopts/TestPhiInSkeletonPredicateExpression.java Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10022/files - new: https://git.openjdk.org/jdk/pull/10022/files/680a0203..a54f0ee7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10022&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10022&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10022.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10022/head:pull/10022 PR: https://git.openjdk.org/jdk/pull/10022 From ngasson at openjdk.org Fri Sep 9 11:59:29 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Fri, 9 Sep 2022 11:59:29 GMT Subject: RFR: 8292675: Add identity transformation for removing redundant AndV/OrV nodes [v2] In-Reply-To: <3D-DKcq_dDWFZXapiQzKXidtnEWNM4bmTgKuG_Qz8Io=.4d48de6f-f7e8-4d11-b58f-abd28ccf197b@github.com> References: <3D-DKcq_dDWFZXapiQzKXidtnEWNM4bmTgKuG_Qz8Io=.4d48de6f-f7e8-4d11-b58f-abd28ccf197b@github.com> Message-ID: On Thu, 8 Sep 2022 16:28:32 GMT, Bhavana Kilambi wrote: >> Recently we found that the rotate left/right benchmarks with vectorapi >> emit a redundant "and" instruction on both aarch64 and x86_64 machines >> which can be done away with. For example - and(and(a, b), b) generates >> two "and" instructions which can be reduced to a single "and" operation- >> and(a, b) since "and" (and "or") operations are commutative and >> idempotent in nature. This can help improve performance for all those >> workloads which have multiple "and"/"or" operations with the same value >> by reducing them to fewer "and"/"or" operations accordingly. >> >> This patch adds the following transformations for vector logical >> operations - AndV and OrV : >> >> >> (OpV (OpV a b) b) => (OpV a b) >> (OpV (OpV a b) a) => (OpV a b) >> (OpV (OpV a b m1) b m1) => (OpV a b m1) >> (OpV (OpV a b m1) a m1) => (OpV a b m1) >> (OpV a (OpV a b)) => (OpV a b) >> (OpV b (OpV a b)) => (OpV a b) >> (OpV a (OpV a b m) m) => (OpV a b m) >> >> where Op = "And", "Or" >> >> Links for benchmarks tested are given below :- >> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L728 >> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L764 >> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L728 >> https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L764 >> >> Before this patch, the disassembly for one these testcases >> (IntMaxVector.ROR) for Neon is shown below : >> ``` >> ldr q16, [x12, #16] >> and v16.16b, v16.16b, v20.16b >> and v16.16b, v16.16b, v20.16b >> add x12, x16, x11 >> sub v17.4s, v21.4s, v16.4s >> ... >> ... >> >> >> After this patch, the disassembly for the same testcase above is shown >> below : >> >> ldr q16, [x12, #16] >> and v16.16b, v16.16b, v20.16b >> add x12, x16, x11 >> sub v17.4s, v21.4s, v16.4s >> ... >> ... >> >> >> The other tests also emit an extra "and" instruction as shown above for >> the vector ROR/ROL operations. >> >> Below are the performance results for the vectorapi rotate tests (tests >> given in the links above) with this patch on aarch64 and x86_64 machines >> (for int and long types) - >> >> >> Benchmark aarch64 x86_64 >> IntMaxVector.ROL 25.57% 26.09% >> IntMaxVector.ROR 23.75% 24.15% >> LongMaxVector.ROL 28.91% 28.51% >> LongMaxVector.ROR 16.51% 29.11% >> >> >> >> The percentage indicates the percent gain/improvement in performance >> (ops/ms) with this patch over the master build without this patch. The >> machine descriptions are given below - >> aarch64 - 128-bit aarch64 machine >> x86_64 - 256-bit x86 machine > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Merge two if conditions and some trivial changes Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10163 From roland at openjdk.org Fri Sep 9 12:05:03 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 Sep 2022 12:05:03 GMT Subject: RFR: 8291599: Assertion in PhaseIdealLoop::skeleton_predicate_has_opaque after JDK-8289127 [v3] In-Reply-To: References: Message-ID: > In TestPhiInSkeletonPredicateExpression.test1(): > > - Loop predication adds predicates for the null check of array and the > array range check. It also adds skeleton predicates in case of > subsequent unrolling. > > - One of the skeleton predicate has the following shape: > > (Opaque4 (Bool (CmpUL (AddL (AddL (ConvI2L (LoadI (Phi ...))) (ConvI2L (CastII (AddI (OpaqueLoopInit OpaqueLoopStride))))) -1) ...))) > > - Split thru phi pushes the null check through the dominating > region. The skeleton predicate subgraph is transformed to: > > (Opaque4 (Bool (CmpUL (Phi ...) ...))) > > - Logic that processes skeleton predicate can no longer find the > OpaqueLoopInit and OpaqueLoopStride nodes because they are now > behind a phi. That causes the assert to fire. > > The fix I propose is to catch cases where part of a skeleton predicate > expression (a subgraph with a OpaqueLoopInit or OpaqueLoopStride node) > is being split during split if and to clone the entire skeleton > predicate subgraph then. > > There's a already logic for that currently but it only triggers if > PhaseIdealLoop::split_up() tries to split an OpaqueLoopInit or > OpaqueLoopStride. In the case here, the OpaqueLoopInit and > OpaqueLoopStride nodes have control above the region at which split if > occurs. So they are never split by PhaseIdealLoop::split_up(). The > AddL nodes in subgraph are. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Christian's review - Merge branch 'master' into JDK-8291599 - Update test/hotspot/jtreg/compiler/loopopts/TestPhiInSkeletonPredicateExpression.java Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Christian Hagedorn - fix - test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10022/files - new: https://git.openjdk.org/jdk/pull/10022/files/a54f0ee7..db4bbedb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10022&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10022&range=01-02 Stats: 50079 lines in 1080 files changed: 19346 ins; 21470 del; 9263 mod Patch: https://git.openjdk.org/jdk/pull/10022.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10022/head:pull/10022 PR: https://git.openjdk.org/jdk/pull/10022 From roland at openjdk.org Fri Sep 9 12:05:04 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 Sep 2022 12:05:04 GMT Subject: RFR: 8291599: Assertion in PhaseIdealLoop::skeleton_predicate_has_opaque after JDK-8289127 [v3] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 06:46:28 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Christian's review >> - Merge branch 'master' into JDK-8291599 >> - Update test/hotspot/jtreg/compiler/loopopts/TestPhiInSkeletonPredicateExpression.java >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopTransform.cpp >> >> Co-authored-by: Christian Hagedorn >> - fix >> - test > > test/hotspot/jtreg/compiler/loopopts/TestPhiInSkeletonPredicateExpression.java line 28: > >> 26: * bug 8291599 >> 27: * @summary Assertion in PhaseIdealLoop::skeleton_predicate_has_opaque after JDK-8289127 >> 28: * @run main/othervm -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:-UseOnStackReplacement -XX:LoopMaxUnroll=0 TestPhiInSkeletonPredicateExpression > > Since `LoopMaxUnroll` is a C2 flag, we should also add a `@requires vm.compiler2.enabled`. Thanks for the review. I made the changes that you suggested. ------------- PR: https://git.openjdk.org/jdk/pull/10022 From roland at openjdk.org Fri Sep 9 12:07:07 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 Sep 2022 12:07:07 GMT Subject: RFR: 8292301: [REDO v2] C2 crash when allocating array of size too large [v3] In-Reply-To: References: Message-ID: > On top of the redo, this fixed 2 bugs: > > 8288184: the problem here is that the ValidLengthTest input of an > AllocateArrayNode becomes a constant. The CatchNode would then change > types if it was reprocessed but it's not. Custom logic is needed to > enqueue the CatchNode when the ValidLengthTest input of an > AllocateArrayNode changes. The CastII out of the AllocateArrayNode > becomes top but the fallthrough path doesn't die. This happens with > igvn in the case of the bug but could also happen with ccp. I fixed > both in this patch. > > 8291665: the code pattern for this is 2 AllocateArrayNodes out of loop > with a shared ValidLengthTest input in a loop. When the loop is cloned > that causes Phis to be added between the AllocateArrayNodes and the > BoolNode of the ValidLengthTest inputs. Split if runs next and it > doesn't expect the Phi at the ValidLengthTest inputs. The fix here is > to clone the Bool/Cmp subgraph down on loop cloning. There's logic for > that when the use of the bool is an If for instance so I simply added > a special case to run that logic for an AllocateArrayNode use as > well. Note that the test case I added fails reliably on 11 but not > with the current jdk developement branch. AFAICT, the bug is there but > something unrelated changed and a slightly different graph is built > for the test case that prevents split if. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/phaseX.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10038/files - new: https://git.openjdk.org/jdk/pull/10038/files/bbf9851d..ed9a377d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10038&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10038&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10038.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10038/head:pull/10038 PR: https://git.openjdk.org/jdk/pull/10038 From roland at openjdk.org Fri Sep 9 12:33:15 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 Sep 2022 12:33:15 GMT Subject: RFR: 8292301: [REDO v2] C2 crash when allocating array of size too large [v4] In-Reply-To: References: Message-ID: > On top of the redo, this fixed 2 bugs: > > 8288184: the problem here is that the ValidLengthTest input of an > AllocateArrayNode becomes a constant. The CatchNode would then change > types if it was reprocessed but it's not. Custom logic is needed to > enqueue the CatchNode when the ValidLengthTest input of an > AllocateArrayNode changes. The CastII out of the AllocateArrayNode > becomes top but the fallthrough path doesn't die. This happens with > igvn in the case of the bug but could also happen with ccp. I fixed > both in this patch. > > 8291665: the code pattern for this is 2 AllocateArrayNodes out of loop > with a shared ValidLengthTest input in a loop. When the loop is cloned > that causes Phis to be added between the AllocateArrayNodes and the > BoolNode of the ValidLengthTest inputs. Split if runs next and it > doesn't expect the Phi at the ValidLengthTest inputs. The fix here is > to clone the Bool/Cmp subgraph down on loop cloning. There's logic for > that when the use of the bool is an If for instance so I simply added > a special case to run that logic for an AllocateArrayNode use as > well. Note that the test case I added fails reliably on 11 but not > with the current jdk developement branch. AFAICT, the bug is there but > something unrelated changed and a slightly different graph is built > for the test case that prevents split if. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - comments - Merge branch 'master' into JDK-8292301 - Update src/hotspot/share/opto/phaseX.cpp Co-authored-by: Tobias Hartmann - undo needless change - dos->unix test file - move tests - test fix - fix - test - test for 8288184 - ... and 2 more: https://git.openjdk.org/jdk/compare/8e22f2bb...9d92011a ------------- Changes: https://git.openjdk.org/jdk/pull/10038/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10038&range=03 Stats: 456 lines in 15 files changed: 376 ins; 53 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/10038.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10038/head:pull/10038 PR: https://git.openjdk.org/jdk/pull/10038 From roland at openjdk.org Fri Sep 9 12:33:16 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 Sep 2022 12:33:16 GMT Subject: RFR: 8292301: [REDO v2] C2 crash when allocating array of size too large [v2] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 07:16:26 GMT, Tobias Hartmann wrote: > Looks good otherwise. Thanks for reviewing. > src/hotspot/share/opto/loopopts.cpp line 2040: > >> 2038: // loop to determine which way the loop exited. >> 2039: // Loop predicate If node connects to Bool node through Opaque1 node. >> 2040: if (use->is_If() || use->is_CMove() || C->is_predicate_opaq(use) || use->Opcode() == Op_Opaque4 || > > Please add a comment describing the new case. Done in updated change. > src/hotspot/share/opto/loopopts.cpp line 2410: > >> 2408: while (split_if_set->size()) { >> 2409: Node *iff = split_if_set->pop(); >> 2410: uint input = iff->Opcode() == Op_AllocateArray ? AllocateNode::ValidLengthTest : 1; > > Suggestion: > > uint input = (iff->Opcode() == Op_AllocateArray) ? AllocateNode::ValidLengthTest : 1; I left that one out as I think this pattern is common enough that it shouldn't be ambiguous. > src/hotspot/share/opto/phaseX.cpp line 1643: > >> 1641: } >> 1642: } >> 1643: if (use_op == Op_AllocateArray && n == use->in(AllocateNode::ValidLengthTest)) { > > Please add a comment. Done in an updated change. ------------- PR: https://git.openjdk.org/jdk/pull/10038 From redestad at openjdk.org Fri Sep 9 12:46:43 2022 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 9 Sep 2022 12:46:43 GMT Subject: RFR: 8293491: Avoid unexpected deoptimization in loop exit due to incorrect branch profiling In-Reply-To: References: Message-ID: <_TZMiZmcSUxois_5DEsXGH-njmDbTKgCU4O3_7-wLW0=.2e79cf6e-362e-44ad-8d58-4d0324c92934@github.com> On Wed, 7 Sep 2022 15:06:18 GMT, Jie Fu wrote: > Hi all, > > Please review this patch which fixes the unexpected deoptimizations in loop exit due to incorrect branch profiling. > > # Background > > While analyzing our big data Apps, we observed unexpected deoptimizations in loop exit due to incorrect branch profiling. > > Here is a reproducer. > > public class UnexpectedLoopExitDeopt { > public static final int N = 20000000; > > public static int d1[] = new int[N]; > public static int d2[] = new int[N]; > > public static void main(String[] args) { > System.out.println(test(d1)); > System.out.println(test(d2)); > } > > public static int test(int[] a) { > int sum = 0; > for(int i = 0; i < a.length; i++) { > sum += a[i]; > } > return sum; > } > } > > > The following is the compilation sequence. > > 77 1 3 java.lang.Object:: (1 bytes) > 83 2 3 java.lang.String::isLatin1 (19 bytes) > 84 6 3 jdk.internal.util.Preconditions::checkIndex (18 bytes) > 84 3 3 java.lang.String::charAt (25 bytes) > 85 4 3 java.lang.StringLatin1::charAt (15 bytes) > 86 7 3 java.lang.String::coder (15 bytes) > 86 8 3 java.lang.String::hashCode (60 bytes) > 87 5 3 java.lang.String::checkIndex (10 bytes) > 87 9 3 java.lang.String::length (11 bytes) > 93 10 n 0 java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L (native) (static) > 96 11 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLL)L (native) (static) > 96 12 n 0 java.lang.Object::hashCode (native) > 97 13 n 0 java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L (native) > 98 14 3 java.util.Objects::requireNonNull (14 bytes) > 98 15 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L (native) (static) > 98 16 1 java.lang.Enum::ordinal (5 bytes) > 101 17 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLL)V (native) (static) > 102 18 n 0 java.lang.invoke.MethodHandle::invokeBasic(LL)L (native) > 212 19 % 3 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) > 213 20 % 4 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) > 221 19 % 3 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) made not entrant > 221 21 4 UnexpectedLoopExitDeopt::test (24 bytes) > 230 20 % 4 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) made not entrant <--- Unexpected deopt > 0 > 242 21 4 UnexpectedLoopExitDeopt::test (24 bytes) made not entrant <--- Unexpected deopt > 0 > > > The last two deopts (made not entrant) happened in the loop exit which are unexpected. > > > # Reason > > The unexpected deopts were caused by the incorrect branch profiling count (0 taken count for loop predicate). > > Here is the profiling data for `UnexpectedLoopExitDeopt::test`. > We can see that for `if_icmpge` @ bci=7, the count for `not taken` is 264957, while 0 for `taken`. > The profile count for zero taken is obvious incorrect since the loop will finally exit (when `i >= a.length`). > So the taken count should be at least 1 for `if_icmpge` @ bci=7. > > 0 iconst_0 > 1 istore_1 > 2 iconst_0 > 3 istore_2 > > 4 iload_2 > 5 fast_aload_0 > 6 arraylength > 7 if_icmpge 22 > 0 bci: 7 BranchData taken(0) displacement(56) > not taken(264957) > > 10 iload_1 > 11 fast_aload_0 > 12 iload_2 > 13 iaload > 14 iadd > 15 istore_1 > 16 iinc #2 1 > 19 goto 4 > 32 bci: 19 JumpData taken(266667) displacement(-32) > > 22 iload_1 > 23 ireturn > > > # Fix > > The main idea is to detect if the branch taken target is a loop exit. > If so, set the taken count to be at least 1. > This is fine because most loops should be finite and would execute the loop exit code at lease once. > For infinite loops like `while (true) {...}`, the patch won't change the original behaviour since there is no loop exit. > > # Testing > > tier1~3 on Linux/x64, no regression > > Thanks. > Best regards, > Jie Don't you need to take care to only mark the exit branch as taken if the loop has actually been entered -- lest weird things might happen w.r.t. DCE of dead loops? ------------- PR: https://git.openjdk.org/jdk/pull/10200 From roland at openjdk.org Fri Sep 9 12:53:43 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 Sep 2022 12:53:43 GMT Subject: RFR: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv [v3] In-Reply-To: References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> Message-ID: On Fri, 9 Sep 2022 05:47:12 GMT, Tobias Hartmann wrote: >> Yes, it does. If this inconsistency doesn't look good, I could also try adding long overflow checks just like what we have in utility function `bool add_overflows(T x, T y)`. > > I'm just wondering if there's a good reason for bailing out for integer overflows and if the same applies to long overflows. @rwestrel, you added that check with JDK-8278296, do you remember why? The: if (scale == min_signed_integer(exp_bt)) { ? (It's` from JDK-8259609) The problem I think is for the expression: -min_jint * i scale here is min_jint initially, stored in a long. It's then multiplied by -1. -min_jint = min_jint when stored in an int but not in a long. When scale is later transformed from a long to an int, some code finds that -(long)min_jint can't be stored in an int. ------------- PR: https://git.openjdk.org/jdk/pull/9851 From ngasson at openjdk.org Fri Sep 9 13:01:44 2022 From: ngasson at openjdk.org (Nick Gasson) Date: Fri, 9 Sep 2022 13:01:44 GMT Subject: RFR: 8292587: AArch64: Support SVE fabd instruction [v2] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 02:50:52 GMT, Hao Sun wrote: >> Scalar and NEON fabd instructions were initially supported in >> JDK-8256318. In this patch, we support SVE fabd instruction [1] and add >> one Jtreg test case as well. >> >> With this patch, two instructions `fsub + fabs` would be combined into >> one single `fabd` instruction. >> >> >> fsub z16.s, z16.s, z17.s >> fabs z16.s, p7/m, z16.s >> >> --> >> >> fabd z16.s, p7/m, z16.s, z17.s >> >> >> In the initial evaluation of JMH case, i.e. >> FloatingScalarVectorAbsDiff.java, we found the performance uplift done >> by this optimization was easily hidden by the heavy memory load/store >> instructions. To avoid that, we updated the JMH case a bit, adding one >> more group of subtraction and Math.abs operations in the loop body. >> >> Here shows the data with the new JMH case on one 256-bit SVE machine. We >> can observe about 39% and 35% improvements for the two functions >> respectively. >> >> >> Benchmark Before After Units >> FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 260.468 160.965 ns/op >> FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 133.963 87.292 ns/op >> >> >> Jtreg testing: tier1~3 passed on one NEON-only machine and one 256-bit SVE machine. >> >> [1] https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FABD--Floating-point-absolute-difference--predicated-- > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > Update the loop limit in VectorAbsDiffTest.java > > As pointed out by Faye Gao, the test results are not fully verified due > to incorrect loop limits. > > Updated it. > > Reran the test and no regression. Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10011 From tholenstein at openjdk.org Fri Sep 9 13:09:49 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 9 Sep 2022 13:09:49 GMT Subject: RFR: JDK-8293477: IGV: Upgrade to Netbeans Platform 15 In-Reply-To: References: Message-ID: <7B_RvjjzxcC4-2s3IwG0iR-FB-XDGI22gSR_LKE4idY=.b7b38909-fb44-4ad3-8747-8382bf7f0333@github.com> On Thu, 8 Sep 2022 11:07:58 GMT, Roberto Casta?eda Lozano wrote: >> Upgrade IGV and dependencies to the newest Netbeans Platform 15 which was released on September 2022 (officially support running on JDK 11 and JDK 17). >> >> ## Testing >> >> Tested the following use cases manually on macOS and JDK 17: >> >> - build with maven 3.8.1 >> - import graphs via network (localhost) >> - Save all groups to XML >> - Save selected groups to XML >> - Remove selected graphs >> - Remove selected groups >> - Remove all groups >> - Open XML graph file >> - Expand groups in Outline >> - Open a graphs in from same and different group in Outline >> - "Open clone" in the Outline >> - "Open Difference to current graph" for graphs in same and different group in Outline >> - Opening a new graph : Updates the Bytecode and Control Flow window >> - Show next / previous graph in current group buttons >> - Expand / Reduce the difference selection buttons >> - Changing of the difference selection by modifying the slider >> - Extract set of selected nodes and check if they are centered >> - Hiding of selected nodes >> - Showing all nodes again >> - Zooming in / out >> - Different views: Sea of nodes / clustered seas of nodes / CFG >> - Satellite view: button and by pressing the S key >> - Enable / Disable "Show neighbouring nodes of fully visible nodes semi-transparent" >> - Undo / Redo >> - Selection mode: button and by holding Ctrl + mouse-drag >> - Searching a node: Selects the node and centres it. Makes the node visible if it is hidden >> - Searching a block: Selects all nodes in the block and centres it. Makes the all the nodes in the block visible >> - Selecting node(s): adjusts colours in slider. Show property in Properties window >> - Hovering a node: highlights node and shows property box >> - Hovering a connection: highlights connection and corresponding nodes >> - apply filters >> - select nodes corresponding to a bytecode >> - select nodes corresponding to a basic block in the control flow > > Tested using JDK 11 and Maven 3.8.4 on both Linux (Ubuntu 20.04) and Windows 10, did not find any regression. Besides the listed use cases, I also tested PDF graph exporting (affected by OpenPDF version update). Thanks @robcasloz and @chhagedorn for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/10195 From tholenstein at openjdk.org Fri Sep 9 13:10:59 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 9 Sep 2022 13:10:59 GMT Subject: RFR: JDK-8293480: IGV: Update Bytecode and ControlFlow Component immediately when opening a new graph [v4] In-Reply-To: References: Message-ID: > The `BytecodeViewTopComponent` and `ControlFlowTopComponent` represent information depending on what graph is open in `EditorTopComponent`. Previously, `BytecodeViewTopComponent` and `ControlFlowTopComponent` did not update its content immediately when a new graph from a different group is opened in `EditorTopComponent`. They also did not update when switching between two tabs of open graph. > > We missed to `fire()` a `diagramChangedEvent` in the constructor of `EditorTopComponent`. We also need to fire when `BytecodeViewTopComponent` and `ControlFlowTopComponent` are initially opened. > Update Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update Bytecode and ControlFlow when a group is removed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10196/files - new: https://git.openjdk.org/jdk/pull/10196/files/49dbaa31..e9088043 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10196&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10196&range=02-03 Stats: 60 lines in 7 files changed: 27 ins; 29 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10196.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10196/head:pull/10196 PR: https://git.openjdk.org/jdk/pull/10196 From tholenstein at openjdk.org Fri Sep 9 13:10:59 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 9 Sep 2022 13:10:59 GMT Subject: RFR: JDK-8293480: IGV: Update Bytecode and ControlFlow Component immediately when opening a new graph [v4] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 11:53:40 GMT, Roberto Casta?eda Lozano wrote: > Thanks for this UI improvement, Tobias, looks good to me! There is one more case where the Bytecode and Control Flow windows get out of sync: after removing all graphs and groups in the Outline, they still show the content of the graph that was last active: > > ![bytecode-and-cfg-leftovers](https://user-images.githubusercontent.com/8792647/189114719-770ba617-e94c-4492-a5ab-81047b8a0b98.png) > > This problem existed before the changeset, so it might be addressed here or in a separate issue, whatever you think makes more sense. Thanks for spotting that @robcasloz! I fixed it now. As a by-product of my fix, the `OutlineTopComponent` now also removes the graph selection when the corresponding `EditorTopComponent` was closed ------------- PR: https://git.openjdk.org/jdk/pull/10196 From tholenstein at openjdk.org Fri Sep 9 13:11:52 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 9 Sep 2022 13:11:52 GMT Subject: Integrated: JDK-8293477: IGV: Upgrade to Netbeans Platform 15 In-Reply-To: References: Message-ID: <07oawWkFYqvwMgQT6ymnT7eChCE2AsXn-lWkVGwvNkI=.8dd3942c-fcef-4f2a-8b12-fda0c5694396@github.com> On Wed, 7 Sep 2022 09:17:26 GMT, Tobias Holenstein wrote: > Upgrade IGV and dependencies to the newest Netbeans Platform 15 which was released on September 2022 (officially support running on JDK 11 and JDK 17). > > ## Testing > > Tested the following use cases manually on macOS and JDK 17: > > - build with maven 3.8.1 > - import graphs via network (localhost) > - Save all groups to XML > - Save selected groups to XML > - Remove selected graphs > - Remove selected groups > - Remove all groups > - Open XML graph file > - Expand groups in Outline > - Open a graphs in from same and different group in Outline > - "Open clone" in the Outline > - "Open Difference to current graph" for graphs in same and different group in Outline > - Opening a new graph : Updates the Bytecode and Control Flow window > - Show next / previous graph in current group buttons > - Expand / Reduce the difference selection buttons > - Changing of the difference selection by modifying the slider > - Extract set of selected nodes and check if they are centered > - Hiding of selected nodes > - Showing all nodes again > - Zooming in / out > - Different views: Sea of nodes / clustered seas of nodes / CFG > - Satellite view: button and by pressing the S key > - Enable / Disable "Show neighbouring nodes of fully visible nodes semi-transparent" > - Undo / Redo > - Selection mode: button and by holding Ctrl + mouse-drag > - Searching a node: Selects the node and centres it. Makes the node visible if it is hidden > - Searching a block: Selects all nodes in the block and centres it. Makes the all the nodes in the block visible > - Selecting node(s): adjusts colours in slider. Show property in Properties window > - Hovering a node: highlights node and shows property box > - Hovering a connection: highlights connection and corresponding nodes > - apply filters > - select nodes corresponding to a bytecode > - select nodes corresponding to a basic block in the control flow This pull request has now been integrated. Changeset: 7169ee5c Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/7169ee5c73c130aacce73cbd3f88377ec07c8311 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod 8293477: IGV: Upgrade to Netbeans Platform 15 Reviewed-by: chagedorn, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/10195 From lujaniuk at openjdk.org Fri Sep 9 13:41:48 2022 From: lujaniuk at openjdk.org (Ludvig Janiuk) Date: Fri, 9 Sep 2022 13:41:48 GMT Subject: RFR: JDK-8291805: IGV: Improve Zooming [v9] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 12:32:44 GMT, Tobias Holenstein wrote: >> # Overview >> >> The zooming is improved in the following ways: >> >> 1) Added a minimum (10%) and maximum (400%) zoom level. If you have a sensitive mouse wheel, it can be annoying to zoom in or out too much (until the graph is invisibly small or the nodes are larger than the window) >> >> 2) Zooming with a trackpad was not very smooth because IGV did panning and zooming at the same time - Now panning is disabled when CMD/Ctrl key is pressed for zooming >> >> 3) When only a few nodes were selected, zooming was no longer mouse centred. Instead, the center of the zooming was in the upper left corner. Now the zooming is centred to the middle of the scene when all selected nodes fit in the screen. >> >> 4) Added a shortcut (Ctrl - 0) to reset the zoom level to 100%. >> >> 5) Updated the Zoom icons to be vector graphics (.svg) >> >> # Implementation >> >> 1) New functions `getZoomMinFactor()` and `getZoomMinFactor()` assure that we do not zoom in or out our infinitely. `getZoomMinFactor()` assures that we do not zoom out further if zoom level is <100% and all visible nodes already fit on the screen. >> >> 2) We introduced a new `MouseCenteredZoomAction.java` for zooming with the mouse/trackpad. `MouseCenteredZoomAction` performs panning when the modifier key is pressed (Ctrl/CMD) and zooming otherwise. The functions `zoomIn ` and `zoomOut` now do animated zooming using `CustomZoomAnimator`. `CustomZoomAnimator` uses the mouse location as the centre of the zoom animation. >> >> 3) The `JScrollPane` now has a `JPanel centeringPanel` with `GridBagLayout()` that contains the `viewComponent`. This assures that the `viewComponent` is always centred when no scrollbars are visible. This makes the `Widget topLeft, bottomRight` obsolete as we can now add a white border of `BORDER_SIZE` to the `DiagramScene` instead. >> >> 4) `ZoomResetAction.java` resets the zoom level to 100%. The shortcut is `Ctrl - 0` and the action is available in the menu: `View` -> `Reset Zoom`. It was not added to the icon menu bar in the `EditorTopComponent` because of space issue. >> >> 5) new self created icons with vector graphics: `zoomIn.svg`, `zoomOut.svg` and `zoomReset.svg` > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > animate Zoom to center Hi uh, I'm not a regular in this codebase, so anything I say should be taken with the caveat that I might not know what I'm doing. With that said, I tested your PR, and zooming still seems pretty broken to me. For one, I don't notice much of a difference except for the animation. Secondly, I think a big part of what makes the zooming feel erratic is there's this "frame" around the scene that forces the zooming to take strange turns. If you'd like, maybe we can talk over zoom ( ;) ) about the zooming behavior? ------------- PR: https://git.openjdk.org/jdk/pull/10026 From thartmann at openjdk.org Fri Sep 9 13:50:44 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 9 Sep 2022 13:50:44 GMT Subject: RFR: 8292301: [REDO v2] C2 crash when allocating array of size too large [v4] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 12:33:15 GMT, Roland Westrelin wrote: >> On top of the redo, this fixed 2 bugs: >> >> 8288184: the problem here is that the ValidLengthTest input of an >> AllocateArrayNode becomes a constant. The CatchNode would then change >> types if it was reprocessed but it's not. Custom logic is needed to >> enqueue the CatchNode when the ValidLengthTest input of an >> AllocateArrayNode changes. The CastII out of the AllocateArrayNode >> becomes top but the fallthrough path doesn't die. This happens with >> igvn in the case of the bug but could also happen with ccp. I fixed >> both in this patch. >> >> 8291665: the code pattern for this is 2 AllocateArrayNodes out of loop >> with a shared ValidLengthTest input in a loop. When the loop is cloned >> that causes Phis to be added between the AllocateArrayNodes and the >> BoolNode of the ValidLengthTest inputs. Split if runs next and it >> doesn't expect the Phi at the ValidLengthTest inputs. The fix here is >> to clone the Bool/Cmp subgraph down on loop cloning. There's logic for >> that when the use of the bool is an If for instance so I simply added >> a special case to run that logic for an AllocateArrayNode use as >> well. Note that the test case I added fails reliably on 11 but not >> with the current jdk developement branch. AFAICT, the bug is there but >> something unrelated changed and a slightly different graph is built >> for the test case that prevents split if. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - comments > - Merge branch 'master' into JDK-8292301 > - Update src/hotspot/share/opto/phaseX.cpp > > Co-authored-by: Tobias Hartmann > - undo needless change > - dos->unix test file > - move tests > - test fix > - fix > - test > - test for 8288184 > - ... and 2 more: https://git.openjdk.org/jdk/compare/8e22f2bb...9d92011a Thanks for adding these comments. Looks good! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/10038 From bkilambi at openjdk.org Fri Sep 9 14:28:49 2022 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 9 Sep 2022 14:28:49 GMT Subject: Integrated: 8292675: Add identity transformation for removing redundant AndV/OrV nodes In-Reply-To: References: Message-ID: On Mon, 5 Sep 2022 10:21:11 GMT, Bhavana Kilambi wrote: > Recently we found that the rotate left/right benchmarks with vectorapi > emit a redundant "and" instruction on both aarch64 and x86_64 machines > which can be done away with. For example - and(and(a, b), b) generates > two "and" instructions which can be reduced to a single "and" operation- > and(a, b) since "and" (and "or") operations are commutative and > idempotent in nature. This can help improve performance for all those > workloads which have multiple "and"/"or" operations with the same value > by reducing them to fewer "and"/"or" operations accordingly. > > This patch adds the following transformations for vector logical > operations - AndV and OrV : > > > (OpV (OpV a b) b) => (OpV a b) > (OpV (OpV a b) a) => (OpV a b) > (OpV (OpV a b m1) b m1) => (OpV a b m1) > (OpV (OpV a b m1) a m1) => (OpV a b m1) > (OpV a (OpV a b)) => (OpV a b) > (OpV b (OpV a b)) => (OpV a b) > (OpV a (OpV a b m) m) => (OpV a b m) > > where Op = "And", "Or" > > Links for benchmarks tested are given below :- > https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L728 > https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java#L764 > https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L728 > https://github.com/openjdk/panama-vector/blob/2aade73adeabdf6a924136b17fd96ccc95c1d160/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java#L764 > > Before this patch, the disassembly for one these testcases > (IntMaxVector.ROR) for Neon is shown below : > ``` > ldr q16, [x12, #16] > and v16.16b, v16.16b, v20.16b > and v16.16b, v16.16b, v20.16b > add x12, x16, x11 > sub v17.4s, v21.4s, v16.4s > ... > ... > > > After this patch, the disassembly for the same testcase above is shown > below : > > ldr q16, [x12, #16] > and v16.16b, v16.16b, v20.16b > add x12, x16, x11 > sub v17.4s, v21.4s, v16.4s > ... > ... > > > The other tests also emit an extra "and" instruction as shown above for > the vector ROR/ROL operations. > > Below are the performance results for the vectorapi rotate tests (tests > given in the links above) with this patch on aarch64 and x86_64 machines > (for int and long types) - > > > Benchmark aarch64 x86_64 > IntMaxVector.ROL 25.57% 26.09% > IntMaxVector.ROR 23.75% 24.15% > LongMaxVector.ROL 28.91% 28.51% > LongMaxVector.ROR 16.51% 29.11% > > > > The percentage indicates the percent gain/improvement in performance > (ops/ms) with this patch over the master build without this patch. The > machine descriptions are given below - > aarch64 - 128-bit aarch64 machine > x86_64 - 256-bit x86 machine This pull request has now been integrated. Changeset: 00befddd Author: Bhavana Kilambi Committer: Nick Gasson URL: https://git.openjdk.org/jdk/commit/00befddd7ce97d324250807824469daaa9434eef Stats: 288 lines in 2 files changed: 286 ins; 0 del; 2 mod 8292675: Add identity transformation for removing redundant AndV/OrV nodes Reviewed-by: thartmann, ngasson ------------- PR: https://git.openjdk.org/jdk/pull/10163 From vlivanov at openjdk.org Fri Sep 9 17:21:12 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 9 Sep 2022 17:21:12 GMT Subject: RFR: 8293044: C1: Missing access check on non-accessible class [v2] In-Reply-To: References: Message-ID: > C1 erroneously omits some access checks on symbolically referenced classes. > > Proposed fix relies on code patching to throw proper resolution error when required. > > Also, to avoid repeated recompilations on platforms which don't support code > patching, the nmethod is not marked as non-entrant when corresponding constant > pool entry is in error state. > > Testing: hs-tier1 - hs-tier4 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/ci/ciStreams.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10222/files - new: https://git.openjdk.org/jdk/pull/10222/files/cb9a7bb3..8a3286f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10222&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10222&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10222.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10222/head:pull/10222 PR: https://git.openjdk.org/jdk/pull/10222 From xliu at openjdk.org Fri Sep 9 17:28:58 2022 From: xliu at openjdk.org (Xin Liu) Date: Fri, 9 Sep 2022 17:28:58 GMT Subject: RFR: 8292301: [REDO v2] C2 crash when allocating array of size too large [v4] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 12:33:15 GMT, Roland Westrelin wrote: >> On top of the redo, this fixed 2 bugs: >> >> 8288184: the problem here is that the ValidLengthTest input of an >> AllocateArrayNode becomes a constant. The CatchNode would then change >> types if it was reprocessed but it's not. Custom logic is needed to >> enqueue the CatchNode when the ValidLengthTest input of an >> AllocateArrayNode changes. The CastII out of the AllocateArrayNode >> becomes top but the fallthrough path doesn't die. This happens with >> igvn in the case of the bug but could also happen with ccp. I fixed >> both in this patch. >> >> 8291665: the code pattern for this is 2 AllocateArrayNodes out of loop >> with a shared ValidLengthTest input in a loop. When the loop is cloned >> that causes Phis to be added between the AllocateArrayNodes and the >> BoolNode of the ValidLengthTest inputs. Split if runs next and it >> doesn't expect the Phi at the ValidLengthTest inputs. The fix here is >> to clone the Bool/Cmp subgraph down on loop cloning. There's logic for >> that when the use of the bool is an If for instance so I simply added >> a special case to run that logic for an AllocateArrayNode use as >> well. Note that the test case I added fails reliably on 11 but not >> with the current jdk developement branch. AFAICT, the bug is there but >> something unrelated changed and a slightly different graph is built >> for the test case that prevents split if. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - comments > - Merge branch 'master' into JDK-8292301 > - Update src/hotspot/share/opto/phaseX.cpp > > Co-authored-by: Tobias Hartmann > - undo needless change > - dos->unix test file > - move tests > - test fix > - fix > - test > - test for 8288184 > - ... and 2 more: https://git.openjdk.org/jdk/compare/8e22f2bb...9d92011a still LGTM. ------------- Changes requested by xliu (Committer). PR: https://git.openjdk.org/jdk/pull/10038 From vlivanov at openjdk.org Fri Sep 9 17:46:01 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 9 Sep 2022 17:46:01 GMT Subject: RFR: 8293044: C1: Missing access check on non-accessible class [v3] In-Reply-To: References: Message-ID: > C1 erroneously omits some access checks on symbolically referenced classes. > > Proposed fix relies on code patching to throw proper resolution error when required. > > Also, to avoid repeated recompilations on platforms which don't support code > patching, the nmethod is not marked as non-entrant when corresponding constant > pool entry is in error state. > > Testing: hs-tier1 - hs-tier4 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: update the test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10222/files - new: https://git.openjdk.org/jdk/pull/10222/files/8a3286f7..ad69916a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10222&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10222&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10222.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10222/head:pull/10222 PR: https://git.openjdk.org/jdk/pull/10222 From vlivanov at openjdk.org Fri Sep 9 17:46:01 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 9 Sep 2022 17:46:01 GMT Subject: RFR: 8293044: C1: Missing access check on non-accessible class [v3] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 09:39:45 GMT, Tobias Hartmann wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> update the test > > test/hotspot/jtreg/compiler/c1/KlassAccessCheckTest.java line 31: > >> 29: * @compile KlassAccessCheck.jasm >> 30: * @run main/othervm -Xbatch -XX:TieredStopAtLevel=1 >> 31: * -XX:+PrintCompilation -XX:CompileCommand=dontinline,KlassAccessCheck.test* > > I think the CompileCommand misses the package name. It should be `-XX:CompileCommand=dontinline,compiler.c1.KlassAccessCheck.test*` > > Also, you may want to remove the `-XX:+PrintCompilation`. Good catch, I removed both options. ------------- PR: https://git.openjdk.org/jdk/pull/10222 From dlong at openjdk.org Fri Sep 9 18:46:56 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 9 Sep 2022 18:46:56 GMT Subject: RFR: 8293287 add ReplayReduce flag [v4] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 01:33:02 GMT, Dean Long wrote: >> Add an experimental flag to help developers "reduce" a replay file. >> >> As a first step, I plan to simulate reduced inlining. This will output multiple "compile" lines as if the first level of inlining never happened: >> A --> B --> C >> A --> D --> E >> becomes >> B --> C >> D --> E >> Developers can repeat iteratively until the replay crash no longer reproduces. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > fix typo > > Co-authored-by: Tobias Hartmann Thanks Tobias and Vladimir. ------------- PR: https://git.openjdk.org/jdk/pull/10134 From dlong at openjdk.org Fri Sep 9 18:49:46 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 9 Sep 2022 18:49:46 GMT Subject: Integrated: 8293287 add ReplayReduce flag In-Reply-To: References: Message-ID: On Fri, 2 Sep 2022 01:20:26 GMT, Dean Long wrote: > Add an experimental flag to help developers "reduce" a replay file. > > As a first step, I plan to simulate reduced inlining. This will output multiple "compile" lines as if the first level of inlining never happened: > A --> B --> C > A --> D --> E > becomes > B --> C > D --> E > Developers can repeat iteratively until the replay crash no longer reproduces. This pull request has now been integrated. Changeset: dbec22b8 Author: Dean Long URL: https://git.openjdk.org/jdk/commit/dbec22b84b0ffce447b43271e12ed7d0eed6c387 Stats: 135 lines in 9 files changed: 41 ins; 88 del; 6 mod 8293287: add ReplayReduce flag Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/10134 From dlong at openjdk.org Fri Sep 9 20:36:18 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 9 Sep 2022 20:36:18 GMT Subject: RFR: 8293044: C1: Missing access check on non-accessible class [v3] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 17:46:01 GMT, Vladimir Ivanov wrote: >> C1 erroneously omits some access checks on symbolically referenced classes. >> >> Proposed fix relies on code patching to throw proper resolution error when required. >> >> Also, to avoid repeated recompilations on platforms which don't support code >> patching, the nmethod is not marked as non-entrant when corresponding constant >> pool entry is in error state. >> >> Testing: hs-tier1 - hs-tier4 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > update the test Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.org/jdk/pull/10222 From vlivanov at openjdk.org Fri Sep 9 20:44:39 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 9 Sep 2022 20:44:39 GMT Subject: RFR: 8293044: C1: Missing access check on non-accessible class [v3] In-Reply-To: References: Message-ID: <_DpoPzdpF4irPoiDe7IXJ_qMsQ2cj6yHydFjX2CykY0=.51ec2475-2943-4929-ab5c-12fd1e92ac3d@github.com> On Fri, 9 Sep 2022 17:46:01 GMT, Vladimir Ivanov wrote: >> C1 erroneously omits some access checks on symbolically referenced classes. >> >> Proposed fix relies on code patching to throw proper resolution error when required. >> >> Also, to avoid repeated recompilations on platforms which don't support code >> patching, the nmethod is not marked as non-entrant when corresponding constant >> pool entry is in error state. >> >> Testing: hs-tier1 - hs-tier4 > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > update the test Thanks for the reviews, Tobias and Dean. ------------- PR: https://git.openjdk.org/jdk/pull/10222 From vlivanov at openjdk.org Fri Sep 9 20:52:28 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 9 Sep 2022 20:52:28 GMT Subject: Integrated: 8293044: C1: Missing access check on non-accessible class In-Reply-To: References: Message-ID: <6ZYaOLbKVXwmm1ye2N_2_7sCip6F8lIStJyPgSJC6sQ=.630848f4-693a-4918-8118-75580118f801@github.com> On Thu, 8 Sep 2022 17:12:36 GMT, Vladimir Ivanov wrote: > C1 erroneously omits some access checks on symbolically referenced classes. > > Proposed fix relies on code patching to throw proper resolution error when required. > > Also, to avoid repeated recompilations on platforms which don't support code > patching, the nmethod is not marked as non-entrant when corresponding constant > pool entry is in error state. > > Testing: hs-tier1 - hs-tier4 This pull request has now been integrated. Changeset: 005b49bb Author: Vladimir Ivanov URL: https://git.openjdk.org/jdk/commit/005b49bb78a468d4e372e6f5fa48bb0db4fd73c2 Stats: 250 lines in 8 files changed: 233 ins; 6 del; 11 mod 8293044: C1: Missing access check on non-accessible class Reviewed-by: thartmann, dlong ------------- PR: https://git.openjdk.org/jdk/pull/10222 From duke at openjdk.org Sat Sep 10 00:34:37 2022 From: duke at openjdk.org (duke) Date: Sat, 10 Sep 2022 00:34:37 GMT Subject: Withdrawn: 8263377: Store method handle linkers in the 'non-nmethods' heap In-Reply-To: References: Message-ID: On Tue, 17 May 2022 23:19:54 GMT, Yi-Fan Tsai wrote: > 8263377: Store method handle linkers in the 'non-nmethods' heap This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/8760 From jbhateja at openjdk.org Sat Sep 10 17:05:38 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 10 Sep 2022 17:05:38 GMT Subject: RFR: 8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms [v4] In-Reply-To: References: Message-ID: > Hi All, > > This patch extends conversion optimizations added with [JDK-8287835](https://bugs.openjdk.org/browse/JDK-8287835) to optimize following floating point to integral conversions for X86 AVX2 targets:- > * D2I , D2S, D2B, F2I , F2S, F2B > > In addition, it also optimizes following wide vector (64 bytes) double to integer and sub-type conversions for AVX512 targets which do not support AVX512DQ feature. > * D2I, D2S, D2B > > Following are the JMH micro performance results with and without patch. > > System configuration: 40C 2S Icelake server (Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz) > > BENCHMARK | SIZE | BASELINE (ops/ms) | WITHOPT (ops/ms) | PERF GAIN FACTOR > -- | -- | -- | -- | -- > VectorFPtoIntCastOperations.microDouble128ToByte128 | 1024 | 90.603 | 92.797 | 1.024215534 > VectorFPtoIntCastOperations.microDouble128ToByte256 | 1024 | 81.909 | 82.3 | 1.00477359 > VectorFPtoIntCastOperations.microDouble128ToByte512 | 1024 | 26.181 | 26.244 | 1.002406325 > VectorFPtoIntCastOperations.microDouble128ToInteger128 | 1024 | 90.74 | 2537.958 | 27.96956138 > VectorFPtoIntCastOperations.microDouble128ToInteger256 | 1024 | 81.586 | 2429.599 | 29.7796068 > VectorFPtoIntCastOperations.microDouble128ToInteger512 | 1024 | 19.406 | 19.61 | 1.010512213 > VectorFPtoIntCastOperations.microDouble128ToLong128 | 1024 | 91.723 | 90.754 | 0.989435583 > VectorFPtoIntCastOperations.microDouble128ToShort128 | 1024 | 91.766 | 1984.577 | 21.62649565 > VectorFPtoIntCastOperations.microDouble128ToShort256 | 1024 | 81.949 | 1940.599 | 23.68056962 > VectorFPtoIntCastOperations.microDouble128ToShort512 | 1024 | 16.468 | 16.56 | 1.005586592 > VectorFPtoIntCastOperations.microDouble256ToByte128 | 1024 | 163.331 | 3018.351 | 18.479964 > VectorFPtoIntCastOperations.microDouble256ToByte256 | 1024 | 148.878 | 3082.034 | 20.70174237 > VectorFPtoIntCastOperations.microDouble256ToByte512 | 1024 | 50.108 | 51.629 | 1.030354434 > VectorFPtoIntCastOperations.microDouble256ToInteger128 | 1024 | 159.805 | 4619.421 | 28.90661118 > VectorFPtoIntCastOperations.microDouble256ToInteger256 | 1024 | 143.876 | 4649.642 | 32.31700909 > VectorFPtoIntCastOperations.microDouble256ToInteger512 | 1024 | 38.127 | 38.188 | 1.001599916 > VectorFPtoIntCastOperations.microDouble256ToLong128 | 1024 | 160.322 | 162.442 | 1.013223388 > VectorFPtoIntCastOperations.microDouble256ToLong256 | 1024 | 141.252 | 143.01 | 1.012445841 > VectorFPtoIntCastOperations.microDouble256ToShort128 | 1024 | 157.717 | 3757.471 | 23.82413437 > VectorFPtoIntCastOperations.microDouble256ToShort256 | 1024 | 143.876 | 3830.971 | 26.62689399 > VectorFPtoIntCastOperations.microDouble256ToShort512 | 1024 | 32.061 | 32.911 | 1.026511962 > VectorFPtoIntCastOperations.microFloat128ToByte128 | 1024 | 146.599 | 4002.967 | 27.30555461 > VectorFPtoIntCastOperations.microFloat128ToByte256 | 1024 | 136.99 | 3938.799 | 28.75245638 > VectorFPtoIntCastOperations.microFloat128ToByte512 | 1024 | 51.561 | 50.284 | 0.975233219 > VectorFPtoIntCastOperations.microFloat128ToInteger128 | 1024 | 5933.565 | 5361.472 | 0.903583596 > VectorFPtoIntCastOperations.microFloat128ToInteger256 | 1024 | 5079.564 | 5062.046 | 0.996551279 > VectorFPtoIntCastOperations.microFloat128ToInteger512 | 1024 | 37.101 | 38.419 | 1.035524649 > VectorFPtoIntCastOperations.microFloat128ToLong128 | 1024 | 145.863 | 145.362 | 0.99656527 > VectorFPtoIntCastOperations.microFloat128ToLong256 | 1024 | 131.159 | 133.154 | 1.015210546 > VectorFPtoIntCastOperations.microFloat128ToShort128 | 1024 | 145.966 | 4150.039 | 28.4315457 > VectorFPtoIntCastOperations.microFloat128ToShort256 | 1024 | 134.703 | 4566.589 | 33.90116775 > VectorFPtoIntCastOperations.microFloat128ToShort512 | 1024 | 31.878 | 30.867 | 0.968285338 > VectorFPtoIntCastOperations.microFloat256ToByte128 | 1024 | 237.841 | 6292.051 | 26.4548627 > VectorFPtoIntCastOperations.microFloat256ToByte256 | 1024 | 222.041 | 6292.748 | 28.34047766 > VectorFPtoIntCastOperations.microFloat256ToByte512 | 1024 | 92.073 | 88.981 | 0.966417951 > VectorFPtoIntCastOperations.microFloat256ToInteger128 | 1024 | 11471.121 | 10269.636 | 0.895260019 > VectorFPtoIntCastOperations.microFloat256ToInteger256 | 1024 | 10729.816 | 10105.92 | 0.941853989 > VectorFPtoIntCastOperations.microFloat256ToInteger512 | 1024 | 68.328 | 70.005 | 1.024543379 > VectorFPtoIntCastOperations.microFloat256ToLong128 | 1024 | 247.101 | 248.571 | 1.005948984 > VectorFPtoIntCastOperations.microFloat256ToLong256 | 1024 | 225.74 | 223.987 | 0.992234429 > VectorFPtoIntCastOperations.microFloat256ToLong512 | 1024 | 76.39 | 76.187 | 0.997342584 > VectorFPtoIntCastOperations.microFloat256ToShort128 | 1024 | 233.196 | 8202.179 | 35.17289748 > VectorFPtoIntCastOperations.microFloat256ToShort256 | 1024 | 220.75 | 7781.073 | 35.24834881 > VectorFPtoIntCastOperations.microFloat256ToShort512 | 1024 | 58.143 | 55.633 | 0.956830573 > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8288043: Code re-factoring. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9748/files - new: https://git.openjdk.org/jdk/pull/9748/files/5cdfd68f..dce02fa0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9748&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9748&range=02-03 Stats: 182 lines in 3 files changed: 46 ins; 68 del; 68 mod Patch: https://git.openjdk.org/jdk/pull/9748.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9748/head:pull/9748 PR: https://git.openjdk.org/jdk/pull/9748 From jiefu at openjdk.org Mon Sep 12 03:46:39 2022 From: jiefu at openjdk.org (Jie Fu) Date: Mon, 12 Sep 2022 03:46:39 GMT Subject: RFR: 8293491: Avoid unexpected deoptimization in loop exit due to incorrect branch profiling [v2] In-Reply-To: References: Message-ID: > Hi all, > > Please review this patch which fixes the unexpected deoptimizations in loop exit due to incorrect branch profiling. > > # Background > > While analyzing our big data Apps, we observed unexpected deoptimizations in loop exit due to incorrect branch profiling. > > Here is a reproducer. > > public class UnexpectedLoopExitDeopt { > public static final int N = 20000000; > > public static int d1[] = new int[N]; > public static int d2[] = new int[N]; > > public static void main(String[] args) { > System.out.println(test(d1)); > System.out.println(test(d2)); > } > > public static int test(int[] a) { > int sum = 0; > for(int i = 0; i < a.length; i++) { > sum += a[i]; > } > return sum; > } > } > > > The following is the compilation sequence. > > 77 1 3 java.lang.Object:: (1 bytes) > 83 2 3 java.lang.String::isLatin1 (19 bytes) > 84 6 3 jdk.internal.util.Preconditions::checkIndex (18 bytes) > 84 3 3 java.lang.String::charAt (25 bytes) > 85 4 3 java.lang.StringLatin1::charAt (15 bytes) > 86 7 3 java.lang.String::coder (15 bytes) > 86 8 3 java.lang.String::hashCode (60 bytes) > 87 5 3 java.lang.String::checkIndex (10 bytes) > 87 9 3 java.lang.String::length (11 bytes) > 93 10 n 0 java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L (native) (static) > 96 11 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLL)L (native) (static) > 96 12 n 0 java.lang.Object::hashCode (native) > 97 13 n 0 java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L (native) > 98 14 3 java.util.Objects::requireNonNull (14 bytes) > 98 15 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L (native) (static) > 98 16 1 java.lang.Enum::ordinal (5 bytes) > 101 17 n 0 java.lang.invoke.MethodHandle::linkToSpecial(LLLL)V (native) (static) > 102 18 n 0 java.lang.invoke.MethodHandle::invokeBasic(LL)L (native) > 212 19 % 3 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) > 213 20 % 4 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) > 221 19 % 3 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) made not entrant > 221 21 4 UnexpectedLoopExitDeopt::test (24 bytes) > 230 20 % 4 UnexpectedLoopExitDeopt::test @ 4 (24 bytes) made not entrant <--- Unexpected deopt > 0 > 242 21 4 UnexpectedLoopExitDeopt::test (24 bytes) made not entrant <--- Unexpected deopt > 0 > > > The last two deopts (made not entrant) happened in the loop exit which are unexpected. > > > # Reason > > The unexpected deopts were caused by the incorrect branch profiling count (0 taken count for loop predicate). > > Here is the profiling data for `UnexpectedLoopExitDeopt::test`. > We can see that for `if_icmpge` @ bci=7, the count for `not taken` is 264957, while 0 for `taken`. > The profile count for zero taken is obvious incorrect since the loop will finally exit (when `i >= a.length`). > So the taken count should be at least 1 for `if_icmpge` @ bci=7. > > 0 iconst_0 > 1 istore_1 > 2 iconst_0 > 3 istore_2 > > 4 iload_2 > 5 fast_aload_0 > 6 arraylength > 7 if_icmpge 22 > 0 bci: 7 BranchData taken(0) displacement(56) > not taken(264957) > > 10 iload_1 > 11 fast_aload_0 > 12 iload_2 > 13 iaload > 14 iadd > 15 istore_1 > 16 iinc #2 1 > 19 goto 4 > 32 bci: 19 JumpData taken(266667) displacement(-32) > > 22 iload_1 > 23 ireturn > > > # Fix > > The main idea is to detect if the branch taken target is a loop exit. > If so, set the taken count to be at least 1. > This is fine because most loops should be finite and would execute the loop exit code at lease once. > For infinite loops like `while (true) {...}`, the patch won't change the original behaviour since there is no loop exit. > > # Testing > > tier1~3 on Linux/x64, no regression > > Thanks. > Best regards, > Jie Jie Fu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Address review comments - Merge branch 'master' into JDK-8293491 - Merge branch 'master' into JDK-8293491 - 8293491: Avoid unexpected deoptimization in loop exit due to incorrect branch profiling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10200/files - new: https://git.openjdk.org/jdk/pull/10200/files/bfb66ad7..e2cd280d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10200&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10200&range=00-01 Stats: 11800 lines in 238 files changed: 6453 ins; 4458 del; 889 mod Patch: https://git.openjdk.org/jdk/pull/10200.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10200/head:pull/10200 PR: https://git.openjdk.org/jdk/pull/10200 From jiefu at openjdk.org Mon Sep 12 03:50:47 2022 From: jiefu at openjdk.org (Jie Fu) Date: Mon, 12 Sep 2022 03:50:47 GMT Subject: RFR: 8293491: Avoid unexpected deoptimization in loop exit due to incorrect branch profiling [v2] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 10:55:13 GMT, Tobias Hartmann wrote: >> Jie Fu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Address review comments >> - Merge branch 'master' into JDK-8293491 >> - Merge branch 'master' into JDK-8293491 >> - 8293491: Avoid unexpected deoptimization in loop exit due to incorrect branch profiling > > Do these deoptimizations really affect performance of your program or did you just spot them when looking at the logs? > > Such surprising deopts are actually expected with optimistic, profile guided optimizations and happen in many other scenarios as well. They are usually harmless. Also, the profile information is not necessarily incorrect but might just be outdated because we stop profiling once we reach C2. Marking all loop exits as taken seems hacky and might have unexpected side effects. > > Also, wouldn't C2 still insert a `Deoptimization::Reason_unreached` or `Deoptimization::Reason_unstable_if` trap for subsequent instructions after the loop exit for which profiling also suggests that they were never executed? Hi @TobiHartmann and @cl4es , patch had been updated. Any comments? Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10200 From chagedorn at openjdk.org Mon Sep 12 06:54:45 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 12 Sep 2022 06:54:45 GMT Subject: RFR: 8291599: Assertion in PhaseIdealLoop::skeleton_predicate_has_opaque after JDK-8289127 [v3] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 12:05:03 GMT, Roland Westrelin wrote: >> In TestPhiInSkeletonPredicateExpression.test1(): >> >> - Loop predication adds predicates for the null check of array and the >> array range check. It also adds skeleton predicates in case of >> subsequent unrolling. >> >> - One of the skeleton predicate has the following shape: >> >> (Opaque4 (Bool (CmpUL (AddL (AddL (ConvI2L (LoadI (Phi ...))) (ConvI2L (CastII (AddI (OpaqueLoopInit OpaqueLoopStride))))) -1) ...))) >> >> - Split thru phi pushes the null check through the dominating >> region. The skeleton predicate subgraph is transformed to: >> >> (Opaque4 (Bool (CmpUL (Phi ...) ...))) >> >> - Logic that processes skeleton predicate can no longer find the >> OpaqueLoopInit and OpaqueLoopStride nodes because they are now >> behind a phi. That causes the assert to fire. >> >> The fix I propose is to catch cases where part of a skeleton predicate >> expression (a subgraph with a OpaqueLoopInit or OpaqueLoopStride node) >> is being split during split if and to clone the entire skeleton >> predicate subgraph then. >> >> There's a already logic for that currently but it only triggers if >> PhaseIdealLoop::split_up() tries to split an OpaqueLoopInit or >> OpaqueLoopStride. In the case here, the OpaqueLoopInit and >> OpaqueLoopStride nodes have control above the region at which split if >> occurs. So they are never split by PhaseIdealLoop::split_up(). The >> AddL nodes in subgraph are. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Christian's review > - Merge branch 'master' into JDK-8291599 > - Update test/hotspot/jtreg/compiler/loopopts/TestPhiInSkeletonPredicateExpression.java > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Christian Hagedorn > - fix > - test That looks good to me, thanks for doing the changes! ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10022 From roland at openjdk.org Mon Sep 12 07:31:55 2022 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 12 Sep 2022 07:31:55 GMT Subject: RFR: 8291599: Assertion in PhaseIdealLoop::skeleton_predicate_has_opaque after JDK-8289127 [v3] In-Reply-To: References: Message-ID: On Mon, 12 Sep 2022 06:51:03 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Christian's review >> - Merge branch 'master' into JDK-8291599 >> - Update test/hotspot/jtreg/compiler/loopopts/TestPhiInSkeletonPredicateExpression.java >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopTransform.cpp >> >> Co-authored-by: Christian Hagedorn >> - fix >> - test > > That looks good to me, thanks for doing the changes! Thanks for the reviews @chhagedorn @TobiHartmann ------------- PR: https://git.openjdk.org/jdk/pull/10022 From roland at openjdk.org Mon Sep 12 07:33:25 2022 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 12 Sep 2022 07:33:25 GMT Subject: Integrated: 8291599: Assertion in PhaseIdealLoop::skeleton_predicate_has_opaque after JDK-8289127 In-Reply-To: References: Message-ID: On Thu, 25 Aug 2022 12:30:02 GMT, Roland Westrelin wrote: > In TestPhiInSkeletonPredicateExpression.test1(): > > - Loop predication adds predicates for the null check of array and the > array range check. It also adds skeleton predicates in case of > subsequent unrolling. > > - One of the skeleton predicate has the following shape: > > (Opaque4 (Bool (CmpUL (AddL (AddL (ConvI2L (LoadI (Phi ...))) (ConvI2L (CastII (AddI (OpaqueLoopInit OpaqueLoopStride))))) -1) ...))) > > - Split thru phi pushes the null check through the dominating > region. The skeleton predicate subgraph is transformed to: > > (Opaque4 (Bool (CmpUL (Phi ...) ...))) > > - Logic that processes skeleton predicate can no longer find the > OpaqueLoopInit and OpaqueLoopStride nodes because they are now > behind a phi. That causes the assert to fire. > > The fix I propose is to catch cases where part of a skeleton predicate > expression (a subgraph with a OpaqueLoopInit or OpaqueLoopStride node) > is being split during split if and to clone the entire skeleton > predicate subgraph then. > > There's a already logic for that currently but it only triggers if > PhaseIdealLoop::split_up() tries to split an OpaqueLoopInit or > OpaqueLoopStride. In the case here, the OpaqueLoopInit and > OpaqueLoopStride nodes have control above the region at which split if > occurs. So they are never split by PhaseIdealLoop::split_up(). The > AddL nodes in subgraph are. This pull request has now been integrated. Changeset: 37df5f56 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/37df5f56259429482cfdbe38e8b6256f1efaf9e8 Stats: 181 lines in 4 files changed: 155 ins; 23 del; 3 mod 8291599: Assertion in PhaseIdealLoop::skeleton_predicate_has_opaque after JDK-8289127 Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/10022 From duke at openjdk.org Mon Sep 12 15:36:47 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Mon, 12 Sep 2022 15:36:47 GMT Subject: RFR: 8293618: x86: Wrong code generation in class Assembler Message-ID: Hi, This patch fixes some issues in the code generation of x86 assembler: - `Assembler::testl` misses `prefix(dst)` - `Assembler::addw` misses the 0x66 prefix - `Assembler::emit_operand` needs the length of the instruction from the address operand, this is often forgotten, making this parameter explicit to prevent potential issues - The assembler should not do optimisations that change the actual emitted instructions, these should be moved to `MacroAssembler` instead AFAICT there is no failure due to these mistakes. Please take a look and give reviews. Thanks you very much. ------------- Commit messages: - fix code generation Changes: https://git.openjdk.org/jdk/pull/10240/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10240&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293618 Stats: 540 lines in 5 files changed: 62 ins; 25 del; 453 mod Patch: https://git.openjdk.org/jdk/pull/10240.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10240/head:pull/10240 PR: https://git.openjdk.org/jdk/pull/10240 From omikhaltcova at openjdk.org Mon Sep 12 15:55:49 2022 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Mon, 12 Sep 2022 15:55:49 GMT Subject: RFR: 8262901: [macos_aarch64] NativeCallTest expected:<-3.8194101E18> but was:<3.02668882E10> Message-ID: This PR is opened as a follow-up for [1] and included the "must-done" fixes pointed by @teshull. This patch for JVMCI includes the following fixes related to the macOS AArch64 calling convention: 1. arguments may consume slots on the stack that are not multiples of 8 bytes [2] 2. natural alignment of stack arguments [2] 3. stack must remain 16-byte aligned [3][4] Tested with tier1 on macOS AArch64 and Linux AArch64. [1] https://github.com/openjdk/jdk/pull/6641 [2] https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms [3] https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-160#stack [4] https://docs.microsoft.com/en-us/cpp/build/stack-usage?view=msvc-170 ------------- Commit messages: - 8262901: [macos_aarch64] NativeCallTest expected:<-3.8194101E18> but was:<3.02668882E10> Changes: https://git.openjdk.org/jdk/pull/10238/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10238&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8262901 Stats: 108 lines in 7 files changed: 95 ins; 4 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/10238.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10238/head:pull/10238 PR: https://git.openjdk.org/jdk/pull/10238 From epeter at openjdk.org Mon Sep 12 16:09:30 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Sep 2022 16:09:30 GMT Subject: RFR: 8288180: C2: VectorPhase must ensure that SafePointNode memory input is a MergeMemNode Message-ID: **Context:** The `GraphKit` seems to assume that the memory input of the map (`SafePointNode`) is always a `MergeMemNode`. It requires this so that it can easily access the memory slices. **Analysis:** However, the `VectorPhase` also generates some `GraphKit` instances, for example in `PhaseVector::expand_vbox_alloc_node`. But at that point we are not in parsing, and the `SafePointNode` might have a folded memory state (not `MergeMemNode`). The assert in `GrahpKit::merged_memory` can thus be triggered. In this particular failure case, the `SafePointNode` was constructed/initialized with memory as a memory-phi, which was the result of a previous `GraphKit::reset_memory` call, which in turn folded the memory (the `MergeMemNode` had only one input, the memory-phi). This on its own does not necessarily trigger our assert. In many cases, the new `GraphKit` first transforms the memory input and calls `GraphKit::set_all_memory`, which makes sure there is a `MergeMemNode`. But in our failure case, `GraphKit::set_all_memory` is never called before we call `GraphKit::merged_memory`. **Side-Note:** The flag (`StressReflectiveCode`) was relevant because it disabled `GraphKit::get_layout_helper` from taking a constant layout helper for `T_LONG`, and instead it had to create a load (which then called `GraphKit::merged_memory`). **Suggested Solution:** `VectorPhase` must ensure that the map's memory input is `MergeMemNode`. We can do this in `clone_jvms`, which is called before we instanciate the `GraphKit`. I added a regression test, which fails without the fix, and passes with it. Ran regression tests, passed. ------------- Commit messages: - 8288180: C2: VectorPhase must ensure that SafePointNode memory input is a MergeMemNode Changes: https://git.openjdk.org/jdk/pull/10215/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10215&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8288180 Stats: 21 lines in 2 files changed: 21 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10215.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10215/head:pull/10215 PR: https://git.openjdk.org/jdk/pull/10215 From epeter at openjdk.org Mon Sep 12 16:31:38 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Sep 2022 16:31:38 GMT Subject: RFR: 8255670: Improve C2's detection of modified nodes Message-ID: Added `record_modified_node` to: Node::clone Node::add_req Node::add_req_batch Node::ins_req Node::add_prec Node::rm_prec Node::set_prec Added `igvn->_worklist.push(node)` in various places that modified a `node` but did not add it to the igvn worklist. 7 times I had to push `Root`, 5 of these it was because of the creation of a `HaltNode`, which means we have a `root->add_req(halt)`. In one case we have a MergeMemStream node, which gets two MergeMem nodes as input, and streams over them. Unfortunately, it modifies one of the two, which then can trigger our assertion code. I now push this node to the igvn worklist, but a better fix would be to make MergeMemStream leave the MergeMem nodes unmodified. I think that should be possible, filed an RFE [JDK-8293358](https://bugs.openjdk.org/browse/JDK-8293358) What I am NOT doing here, and leave to a later RFE: investigate / implement these assertions for late/incremental inlining. Ran larger regression tests, and 7-9h of fuzzing on 3 platforms. ------------- Commit messages: - MergeMemStream: had to move igvn worklist push further out to catch more cases - improved MergeMemStream comment - adding worklist.push for a recent halt node introduction - Merge branch 'master' into JDK-8255670 - 8255670: Improve C2's detection of modified nodes Changes: https://git.openjdk.org/jdk/pull/9439/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9439&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8255670 Stats: 21 lines in 6 files changed: 19 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9439.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9439/head:pull/9439 PR: https://git.openjdk.org/jdk/pull/9439 From epeter at openjdk.org Mon Sep 12 16:31:39 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Sep 2022 16:31:39 GMT Subject: RFR: 8255670: Improve C2's detection of modified nodes In-Reply-To: References: Message-ID: On Sat, 9 Jul 2022 08:37:24 GMT, Emanuel Peter wrote: > Added `record_modified_node` to: > > Node::clone > Node::add_req > Node::add_req_batch > Node::ins_req > Node::add_prec > Node::rm_prec > Node::set_prec > > > Added `igvn->_worklist.push(node)` in various places that modified a `node` but did not add it to the igvn worklist. > > 7 times I had to push `Root`, 5 of these it was because of the creation of a `HaltNode`, which means we have a `root->add_req(halt)`. > > In one case we have a MergeMemStream node, which gets two MergeMem nodes as input, and streams over them. > Unfortunately, it modifies one of the two, which then can trigger our assertion code. I now push this node to the igvn worklist, but a better fix would be to make MergeMemStream leave the MergeMem nodes unmodified. I think that should be possible, filed an RFE [JDK-8293358](https://bugs.openjdk.org/browse/JDK-8293358) > > What I am NOT doing here, and leave to a later RFE: investigate / implement these assertions for late/incremental inlining. > > Ran larger regression tests, and 7-9h of fuzzing on 3 platforms. I'm back to work. Will spend time on this during the next weeks. ------------- PR: https://git.openjdk.org/jdk/pull/9439 From aph at openjdk.org Mon Sep 12 16:56:44 2022 From: aph at openjdk.org (Andrew Haley) Date: Mon, 12 Sep 2022 16:56:44 GMT Subject: RFR: 8262901: [macos_aarch64] NativeCallTest expected:<-3.8194101E18> but was:<3.02668882E10> In-Reply-To: References: Message-ID: On Mon, 12 Sep 2022 14:35:30 GMT, Olga Mikhaltsova wrote: > This PR is opened as a follow-up for [1] and included the "must-done" fixes pointed by @teshull. > > This patch for JVMCI includes the following fixes related to the macOS AArch64 calling convention: > 1. arguments may consume slots on the stack that are not multiples of 8 bytes [2] > 2. natural alignment of stack arguments [2] > 3. stack must remain 16-byte aligned [3][4] > > Tested with tier1 on macOS AArch64 and Linux AArch64. > > [1] https://github.com/openjdk/jdk/pull/6641 > [2] https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms > [3] https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-160#stack > [4] https://docs.microsoft.com/en-us/cpp/build/stack-usage?view=msvc-170 src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot.aarch64/src/jdk/vm/ci/hotspot/aarch64/AArch64HotSpotRegisterConfig.java line 291: > 289: currentStackOffset += Math.max(valueKind.getPlatformKind().getSizeInBytes(), target.wordSize); > 290: } > 291: } So I'm curious: why not subclass `AArch64HotSpotRegisterConfig` here, or maybe even use an interface, rather than the boolean? ------------- PR: https://git.openjdk.org/jdk/pull/10238 From shade at openjdk.org Mon Sep 12 17:15:22 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 Sep 2022 17:15:22 GMT Subject: RFR: 8293654: Improve SharedRuntime handling of continuation helper out-arguments Message-ID: (Found this while adapting current mainline to x86_32 port) After [JDK-8292584](https://bugs.openjdk.org/browse/JDK-8292584), we have `gen_continuation_yield()` that generates compiled entry, and implicitly uses the defaults for other ones (interpreter, exception). We should be more explicit about these, and verify the generators properly initialized all out-parameters. I think we are only using interpreter/exception entry in `enterContinuation`, but not in `yield`. Notably, `exception_offset` should be `-1` for `nmethod::new_native_nmethod` to treat it as "no exception handlers". There a many ways to strengthen this, this PR is the one I like. I can do the symmetric change in aarch64, once we are agree on x86_64 version. Additional testing: - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` - [ ] Linux x86_64 fastdebug, `tier1` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/10241/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10241&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293654 Stats: 23 lines in 1 file changed: 15 ins; 7 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10241.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10241/head:pull/10241 PR: https://git.openjdk.org/jdk/pull/10241 From xliu at openjdk.org Mon Sep 12 18:52:46 2022 From: xliu at openjdk.org (Xin Liu) Date: Mon, 12 Sep 2022 18:52:46 GMT Subject: RFR: 8292301: [REDO v2] C2 crash when allocating array of size too large [v4] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 12:33:15 GMT, Roland Westrelin wrote: >> On top of the redo, this fixed 2 bugs: >> >> 8288184: the problem here is that the ValidLengthTest input of an >> AllocateArrayNode becomes a constant. The CatchNode would then change >> types if it was reprocessed but it's not. Custom logic is needed to >> enqueue the CatchNode when the ValidLengthTest input of an >> AllocateArrayNode changes. The CastII out of the AllocateArrayNode >> becomes top but the fallthrough path doesn't die. This happens with >> igvn in the case of the bug but could also happen with ccp. I fixed >> both in this patch. >> >> 8291665: the code pattern for this is 2 AllocateArrayNodes out of loop >> with a shared ValidLengthTest input in a loop. When the loop is cloned >> that causes Phis to be added between the AllocateArrayNodes and the >> BoolNode of the ValidLengthTest inputs. Split if runs next and it >> doesn't expect the Phi at the ValidLengthTest inputs. The fix here is >> to clone the Bool/Cmp subgraph down on loop cloning. There's logic for >> that when the use of the bool is an If for instance so I simply added >> a special case to run that logic for an AllocateArrayNode use as >> well. Note that the test case I added fails reliably on 11 but not >> with the current jdk developement branch. AFAICT, the bug is there but >> something unrelated changed and a slightly different graph is built >> for the test case that prevents split if. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - comments > - Merge branch 'master' into JDK-8292301 > - Update src/hotspot/share/opto/phaseX.cpp > > Co-authored-by: Tobias Hartmann > - undo needless change > - dos->unix test file > - move tests > - test fix > - fix > - test > - test for 8288184 > - ... and 2 more: https://git.openjdk.org/jdk/compare/8e22f2bb...9d92011a LGTM. ------------- Marked as reviewed by xliu (Committer). PR: https://git.openjdk.org/jdk/pull/10038 From haosun at openjdk.org Tue Sep 13 00:56:50 2022 From: haosun at openjdk.org (Hao Sun) Date: Tue, 13 Sep 2022 00:56:50 GMT Subject: RFR: 8292587: AArch64: Support SVE fabd instruction [v2] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 12:59:29 GMT, Nick Gasson wrote: >> Hao Sun has updated the pull request incrementally with one additional commit since the last revision: >> >> Update the loop limit in VectorAbsDiffTest.java >> >> As pointed out by Faye Gao, the test results are not fully verified due >> to incorrect loop limits. >> >> Updated it. >> >> Reran the test and no regression. > > Marked as reviewed by ngasson (Reviewer). Thanks for your code reviews, @nick-arm @nsjian and @fg1417 . Thanks for your testing, @TobiHartmann ! I don't think the GHA test failure is related to this patch. Hence, I suppose we can integrate this PR now. ------------- PR: https://git.openjdk.org/jdk/pull/10011 From fgao at openjdk.org Tue Sep 13 02:14:42 2022 From: fgao at openjdk.org (Fei Gao) Date: Tue, 13 Sep 2022 02:14:42 GMT Subject: RFR: 8275275: AArch64: Fix performance regression after auto-vectorization on NEON [v2] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 07:59:02 GMT, Tobias Hartmann wrote: > I tested this in our CI. All tests passed. Thanks for your effort @TobiHartmann . ------------- PR: https://git.openjdk.org/jdk/pull/10175 From yyang at openjdk.org Tue Sep 13 02:15:49 2022 From: yyang at openjdk.org (Yi Yang) Date: Tue, 13 Sep 2022 02:15:49 GMT Subject: RFR: 8288204: GVN Crash: assert() failed: correct memory chain In-Reply-To: References: Message-ID: <_p6xMjVsGJiYT56sRwFOnvvuafyo3ZVSOaI9WnBD8o8=.c65fcb9a-e2be-44e1-b15e-0ec172de57cc@github.com> On Mon, 8 Aug 2022 11:12:08 GMT, Yi Yang wrote: >> Hi can I have a review for this fix? LoadBNode::Ideal crashes after performing GVN right after EA. The bad IR is as follows: >> >> ![image](https://user-images.githubusercontent.com/5010047/183106710-3a518e5e-0b59-4c3c-aba4-8b6fcade3519.png) >> >> The memory input of Load#971 is Phi#1109 and the address input of Load#971 is AddP whose object base is CheckCastPP#335: >> >> The type of Phi#1109 is `byte[int:>=0]:exact+any *` while `byte[int:8]:NotNull:exact+any *,iid=177` is the type of CheckCastPP#335 due to EA, they have different alias index, that's why we hit the assertion at L226: >> >> https://github.com/openjdk/jdk/blob/b17a745d7f55941f02b0bdde83866aa5d32cce07/src/hotspot/share/opto/memnode.cpp#L207-L226 >> (t is `byte[int:>=0]:exact+any *`, t_adr is `byte[int:8]:NotNull:exact+any *,iid=177`). >> >> There is a long story. In the beginning, LoadB#971 is generated at array_copy_forward, and GVN transformed it iteratively: >> >> 971 LoadB === 1115 1046 969 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 1109 969 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 1109 969 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 1109 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> ... >> >> In this case, we get alias index 5 from address input AddP#969, and step it through MergeMem#1046, we found Phi#1109 then, that's why LoadB->in(Mem) is changed from MergeMem#1046 to Phi#1109 (Which finally leads to crash). >> >> 1046 MergeMem === _ 1 160 389 389 1109 1 1 389 1 1 1 1 1 1 1 1 1 1 1 1 1 709 709 709 709 882 888 894 190 190 912 191 [[ 1025 1021 1017 1013 1009 1005 1002 1001 998 996 991 986 981 976 971 966 962 961 960 121 122 123 124 1027 ]] >> >> >> After applying this patch, some related nodes are pushed into the GVN worklist, before stepping through MergeMem#1046, the address input is already changed to AddP#473. i.e., we get alias index 32 from address input AddP#473, and step it through MergeMem#1046, we found StoreB#191 then,LoadB->in(Mem) is changed from MergeMem#1046 to StoreB#191. >> >> 971 LoadB === 1115 1046 969 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 1046 969 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 1046 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 1115 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 468 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 468 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> 971 LoadB === 390 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) >> ... >> >> The well-formed IR looks like this: >> ![image](https://user-images.githubusercontent.com/5010047/183239456-7096ea66-6fca-4c84-8f46-8c42d10b686a.png) >> >> Thanks for your patience. > >> 1 LoadB === 1115 1046 969 [[ 972 ]] @b > > Hi @TobiHartmann , this patch works well with StressIGVN. There is an explicit dependency path > > https://github.com/openjdk/jdk/blob/b17a745d7f55941f02b0bdde83866aa5d32cce07/src/hotspot/share/opto/memnode.cpp#L322-L327 > > i.e. load node delayed its idealization until its memory input is processed. This means, MergeMem#1046 and its related node were always processed before processing load node. That's why we saw load->in(Addr) was changed from 969 to 473. > > 971 LoadB === 1115 1046 969 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) > 971 LoadB === 1115 1046 969 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) > 971 LoadB === 1115 1046 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) > 971 LoadB === 1115 191 473 [[ 972 ]] @byte[int:8]:NotNull:exact+any *,iid=177, idx=32; #byte !jvms: String::coder @ bci:0 (line 4540) String::getBytes @ bci:1 (line 4453) StringConcatHelper::prepend @ bci:21 (line 354) StringConcatHelper::simpleConcat @ bci:81 (line 425) DirectMethodHandle$Holder::invokeStatic @ bci:11 DelegatingMethodHandle$Holder::reinvoke_L @ bci:14 Invokers$Holder::linkToTargetMethod @ bci:6 Test::test @ bci:121 (line 22) > Comment to keep this open. @kelthuzadx, let me know if you need help with reproducing this. Yes, I'm able to reproduce this, I'm working on other stuff, I will be back at the end of this month ------------- PR: https://git.openjdk.org/jdk/pull/9777 From haosun at openjdk.org Tue Sep 13 02:22:18 2022 From: haosun at openjdk.org (Hao Sun) Date: Tue, 13 Sep 2022 02:22:18 GMT Subject: Integrated: 8292587: AArch64: Support SVE fabd instruction In-Reply-To: References: Message-ID: <8TrcbMIsekRDAKlKCoOp2xnsz2A_9aUKGQUZaw8WKDg=.1a256b07-ef2d-4db8-93b3-3b8fef954d9c@github.com> On Thu, 25 Aug 2022 01:52:41 GMT, Hao Sun wrote: > Scalar and NEON fabd instructions were initially supported in > JDK-8256318. In this patch, we support SVE fabd instruction [1] and add > one Jtreg test case as well. > > With this patch, two instructions `fsub + fabs` would be combined into > one single `fabd` instruction. > > > fsub z16.s, z16.s, z17.s > fabs z16.s, p7/m, z16.s > > --> > > fabd z16.s, p7/m, z16.s, z17.s > > > In the initial evaluation of JMH case, i.e. > FloatingScalarVectorAbsDiff.java, we found the performance uplift done > by this optimization was easily hidden by the heavy memory load/store > instructions. To avoid that, we updated the JMH case a bit, adding one > more group of subtraction and Math.abs operations in the loop body. > > Here shows the data with the new JMH case on one 256-bit SVE machine. We > can observe about 39% and 35% improvements for the two functions > respectively. > > > Benchmark Before After Units > FloatingScalarVectorAbsDiff.testVectorAbsDiffDouble 260.468 160.965 ns/op > FloatingScalarVectorAbsDiff.testVectorAbsDiffFloat 133.963 87.292 ns/op > > > Jtreg testing: tier1~3 passed on one NEON-only machine and one 256-bit SVE machine. > > [1] https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FABD--Floating-point-absolute-difference--predicated-- This pull request has now been integrated. Changeset: cbee0bc9 Author: Hao Sun Committer: Ningsheng Jian URL: https://git.openjdk.org/jdk/commit/cbee0bc9ef50977dd7111e2745aacd2dda70ceb2 Stats: 277 lines in 7 files changed: 234 ins; 0 del; 43 mod 8292587: AArch64: Support SVE fabd instruction Reviewed-by: njian, fgao, ngasson ------------- PR: https://git.openjdk.org/jdk/pull/10011 From fgao at openjdk.org Tue Sep 13 02:34:59 2022 From: fgao at openjdk.org (Fei Gao) Date: Tue, 13 Sep 2022 02:34:59 GMT Subject: RFR: 8289422: Fix and re-enable vector conditional move [v4] In-Reply-To: References: <6uthI29shZjAeLK-eV3Kxqao06qoa9U9zQ5g_oDLmkI=.3e171aae-2003-46c9-88ac-9a63fecc5d96@github.com> Message-ID: On Tue, 6 Sep 2022 02:47:38 GMT, Fei Gao wrote: >> // float[] a, float[] b, float[] c; >> for (int i = 0; i < a.length; i++) { >> c[i] = (a[i] > b[i]) ? a[i] : b[i]; >> } >> >> >> After [JDK-8139340](https://bugs.openjdk.org/browse/JDK-8139340) and [JDK-8192846](https://bugs.openjdk.org/browse/JDK-8192846), we hope to vectorize the case >> above by enabling -XX:+UseCMoveUnconditionally and -XX:+UseVectorCmov. >> But the transformation here[1] is going to optimize the BoolNode >> with constant input to a constant and break the design logic of >> cmove vector node[2]. We can't prevent all GVN transformation to >> the BoolNode before matcher, so the patch keeps the condition input >> as a constant while creating a cmove vector node, and then >> restructures it into a binary tree before matching. >> >> When the input order of original cmp node is different from the >> input order of original cmove node, like: >> >> // float[] a, float[] b, float[] c; >> for (int i = 0; i < a.length; i++) { >> c[i] = (a[i] < b[i]) ? a[i] : b[i]; >> } >> >> the patch negates the mask of the BoolNode before creating the >> cmove vector node in SuperWord::output(). >> >> We can also use VectorNode::implemented() to consult if vector >> conditional move is supported in the backend. So, the patch cleans >> the related code in SuperWord::implemented(). >> >> With the patch, the performance uplift is: >> (The micro-benchmark functions are included in the file >> test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java) >> >> AArch64: >> Benchmark (length) Mode Cnt uplift(ns/op) >> cmoveD 523 avgt 15 68.89% >> cmoveF 523 avgt 15 72.40% >> >> X86: >> Benchmark (length) Mode Cnt uplift(ns/op) >> cmoveD 523 avgt 15 73.12% >> cmoveF 523 avgt 15 85.45% >> >> [1]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/subnode.cpp#L1310 >> [2]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/matcher.cpp#L2365 > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Rebase the patch to the latest JDK and add some testcase for NE and EQ > > Change-Id: Ifb02b5efc2a09e6e0b4fc1c8346698597464f448 > - Merge branch 'master' into fg8289422 > > Change-Id: I09677cb07f6b2717aa768a830663ca455806b900 > - Merge branch 'master' into fg8289422 > > Change-Id: I870c7bbc73d12bac16756226125edc1a229ba412 > - Enable the test only on aarch64 platform because X86 supports vector cmove only on some 256-bits AVXs > > Change-Id: I64dd49380fe3d303ef6be21460df3be31c1458f8 > - Merge branch 'master' into fg8289422 > > Change-Id: I7936552df6ac12949ed8b550576f4e3520596423 > - 8289422: Fix and re-enable vector conditional move > > ``` > // float[] a, float[] b, float[] c; > for (int i = 0; i < a.length; i++) { > c[i] = (a[i] > b[i]) ? a[i] : b[i]; > } > ``` > > After JDK-8139340 and JDK-8192846, we hope to vectorize the case > above by enabling -XX:+UseCMoveUnconditionally and -XX:+UseVectorCmov. > But the transformation here[1] is going to optimize the BoolNode > with constant input to a constant and break the design logic of > cmove vector node[2]. We can't prevent all GVN transformation to > the BoolNode before matcher, so the patch keeps the condition input > as a constant while creating a cmove vector node, and then > restructures it into a binary tree before matching. > > When the input order of original cmp node is different from the > input order of original cmove node, like: > ``` > // float[] a, float[] b, float[] c; > for (int i = 0; i < a.length; i++) { > c[i] = (a[i] < b[i]) ? a[i] : b[i]; > } > ``` > the patch negates the mask of the BoolNode before creating the > cmove vector node in SuperWord::output(). > > We can also use VectorNode::implemented() to consult if vector > conditional move is supported in the backend. So, the patch cleans > the related code in SuperWord::implemented(). > > With the patch, the performance uplift is: > (The micro-benchmark functions are included in the file > test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java) > > AArch64: > Benchmark (length) Mode Cnt uplift(ns/op) > cmoveD 523 avgt 15 68.89% > cmoveF 523 avgt 15 72.40% > > X86: > Benchmark (length) Mode Cnt uplift(ns/op) > cmoveD 523 avgt 15 73.12% > cmoveF 523 avgt 15 85.45% > > [1]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/subnode.cpp#L1310 > [2]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/matcher.cpp#L2365 > > Change-Id: If046dd745024deb0e602bf7efc2a07c22b89c690 > Thanks, I can see failures with the following tests when running with `-XX:+UseCMoveUnconditionally -XX:+UseVectorCmov`: > > * `compiler/c2/TestCondAddDeadBranch.java` > * `compiler/loopopts/TestCastFFAtPhi.java` > > They also happen without this patch. Should we file a separate bug or are these supposed to be fixed by this change? Thanks for your effort, @TobiHartmann . The backtrace of the failure is different from the problem that the patch tries to fix. It may be caused by another problem in mid-end. So I prefer to fix it in a separate patch and try to make each patch much easier. WDYT? ------------- PR: https://git.openjdk.org/jdk/pull/9652 From pli at openjdk.org Tue Sep 13 02:52:41 2022 From: pli at openjdk.org (Pengfei Li) Date: Tue, 13 Sep 2022 02:52:41 GMT Subject: RFR: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv [v3] In-Reply-To: References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> Message-ID: On Fri, 9 Sep 2022 12:50:02 GMT, Roland Westrelin wrote: >> I'm just wondering if there's a good reason for bailing out for integer overflows and if the same applies to long overflows. @rwestrel, you added that check with JDK-8278296, do you remember why? > > The: > if (scale == min_signed_integer(exp_bt)) { > ? > (It's` from JDK-8259609) > The problem I think is for the expression: -min_jint * i > scale here is min_jint initially, stored in a long. It's then multiplied by -1. -min_jint = min_jint when stored in an int but not in a long. When scale is later transformed from a long to an int, some code finds that -(long)min_jint can't be stored in an int. Thanks for explanation. In my understanding, `min_jint` is also a special point where bailing out is required. I should update the condition of `scale_sum < min_signed_integer(exp_bt)` to `scale_sum <= min_signed_integer(exp_bt)`, right? ------------- PR: https://git.openjdk.org/jdk/pull/9851 From fgao at openjdk.org Tue Sep 13 03:17:37 2022 From: fgao at openjdk.org (Fei Gao) Date: Tue, 13 Sep 2022 03:17:37 GMT Subject: Integrated: 8275275: AArch64: Fix performance regression after auto-vectorization on NEON In-Reply-To: References: Message-ID: On Tue, 6 Sep 2022 03:13:25 GMT, Fei Gao wrote: > For some vector opcodes, there are no corresponding AArch64 NEON > instructions but supporting them benefits vector API. Some of > this kind of opcodes are also used by superword for auto- > vectorization and here is the list: > > VectorCastD2I, VectorCastL2F > MulVL > AddReductionVI/L/F/D > MulReductionVI/L/F/D > AndReductionV, OrReductionV, XorReductionV > > > We did some micro-benchmark performance tests on NEON and found > that some of listed opcodes hurt the performance of loops after > auto-vectorization, but others don't. > > This patch disables those opcodes for superword, which have > obvious performance regressions after auto-vectorization on > NEON. Besides, one jtreg test case, where IR nodes are checked, > is added in the patch to protect the code against change by > mistake in the future. > > Here is the performance data before and after the patch on NEON. > > Benchmark length Mode Cnt Before After Units > AddReductionVD 1024 thrpt 15 450.830 548.001 ops/ms > AddReductionVF 1024 thrpt 15 514.468 548.013 ops/ms > MulReductionVD 1024 thrpt 15 405.613 499.531 ops/ms > MulReductionVF 1024 thrpt 15 451.292 495.061 ops/ms > > Note: > Because superword doesn't vectorize reductions unconnected with > other vector packs, the benchmark function for Add/Mul > reduction is like: > > // private double[] da, db; > // private double dresult; > public void AddReductionVD() { > double result = 1; > for (int i = startIndex; i < length; i++) { > result += (da[i] + db[i]); > } > dresult += result; > } > > > Specially, vector multiply long has been implemented but disabled > for both vector API and superword. Out of the same reason, the > patch re-enables MulVL on NEON for Vector API but still disables > it for superword. The performance uplift on vector API is ~12.8x > on my local. > > Benchmark length Mode Cnt Before After Units > Long128Vector.MUL 1024 thrpt 10 55.015 760.593 ops/ms > MulVL(superword) 1024 thrpt 10 907.788 907.805 ops/ms > > Note: > The superword benchmark function is: > > // private long[] in1, in2, res; > public void MulVL() { > for (int i = 0; i < length; i++) { > res[i] = in1[i] * in2[i]; > } > } > > The Vector API benchmark case is from: > https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/Long128Vector.java#L190 This pull request has now been integrated. Changeset: ec2629c0 Author: Fei Gao Committer: Pengfei Li URL: https://git.openjdk.org/jdk/commit/ec2629c052c8e0ae0ca9e2e027ac9854a56a889a Stats: 472 lines in 5 files changed: 446 ins; 10 del; 16 mod 8275275: AArch64: Fix performance regression after auto-vectorization on NEON Reviewed-by: aph, xgong ------------- PR: https://git.openjdk.org/jdk/pull/10175 From dlong at openjdk.org Tue Sep 13 06:38:40 2022 From: dlong at openjdk.org (Dean Long) Date: Tue, 13 Sep 2022 06:38:40 GMT Subject: RFR: 8293654: Improve SharedRuntime handling of continuation helper out-arguments In-Reply-To: References: Message-ID: On Mon, 12 Sep 2022 16:59:45 GMT, Aleksey Shipilev wrote: > (Found this while adapting current mainline to x86_32 port) > > After [JDK-8292584](https://bugs.openjdk.org/browse/JDK-8292584), we have `gen_continuation_yield()` that generates compiled entry, and implicitly uses the defaults for other ones (interpreter, exception). We should be more explicit about these, and verify the generators properly initialized all out-parameters. > > I think we are only using interpreter/exception entry in `enterContinuation`, but not in `yield`. Notably, `exception_offset` should be `-1` for `nmethod::new_native_nmethod` to treat it as "no exception handlers". > > There a many ways to strengthen this, this PR is the one I like. I can do the symmetric change in aarch64, once we are agree on x86_64 version. > > Additional testing: > - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` > - [x] Linux x86_64 fastdebug, `tier1` Marked as reviewed by dlong (Reviewer). src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1611: > 1609: } > 1610: > 1611: if (method->is_continuation_enter_intrinsic()) { I suggest putting these if's inside an #ifdef ASSERT. Or if you don't mind breaking up the asserts, move them up into the if's above. ------------- PR: https://git.openjdk.org/jdk/pull/10241 From shade at openjdk.org Tue Sep 13 07:15:50 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Sep 2022 07:15:50 GMT Subject: RFR: 8293654: Improve SharedRuntime handling of continuation helper out-arguments [v2] In-Reply-To: References: Message-ID: <5rNmaj2HHOoAg6W9_6iQ-_IN2hNGROcOLPciNec8qgg=.cb10bc31-cd46-4c8c-b9b3-e7e43c1d9696@github.com> > (Found this while adapting current mainline to x86_32 port) > > After [JDK-8292584](https://bugs.openjdk.org/browse/JDK-8292584), we have `gen_continuation_yield()` that generates compiled entry, and implicitly uses the defaults for other ones (interpreter, exception). We should be more explicit about these, and verify the generators properly initialized all out-parameters. > > I think we are only using interpreter/exception entry in `enterContinuation`, but not in `yield`. Notably, `exception_offset` should be `-1` for `nmethod::new_native_nmethod` to treat it as "no exception handlers". > > There a many ways to strengthen this, this PR is the one I like. I can do the symmetric change in aarch64, once we are agree on x86_64 version. > > Additional testing: > - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` > - [x] Linux x86_64 fastdebug, `tier1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Wrap in ASSERT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10241/files - new: https://git.openjdk.org/jdk/pull/10241/files/f9101ab7..f187415f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10241&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10241&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10241.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10241/head:pull/10241 PR: https://git.openjdk.org/jdk/pull/10241 From shade at openjdk.org Tue Sep 13 07:15:52 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Sep 2022 07:15:52 GMT Subject: RFR: 8293654: Improve SharedRuntime handling of continuation helper out-arguments [v2] In-Reply-To: References: Message-ID: On Tue, 13 Sep 2022 06:35:33 GMT, Dean Long wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Wrap in ASSERT > > src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 1611: > >> 1609: } >> 1610: >> 1611: if (method->is_continuation_enter_intrinsic()) { > > I suggest putting these if's inside an #ifdef ASSERT. Or if you don't mind breaking up the asserts, move them up into the if's above. Done the `#ifdef ASSERT` thing in new commit. ------------- PR: https://git.openjdk.org/jdk/pull/10241 From shade at openjdk.org Tue Sep 13 07:56:49 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Sep 2022 07:56:49 GMT Subject: RFR: 8293654: Improve SharedRuntime handling of continuation helper out-arguments [v3] In-Reply-To: References: Message-ID: > (Found this while adapting current mainline to x86_32 port) > > After [JDK-8292584](https://bugs.openjdk.org/browse/JDK-8292584), we have `gen_continuation_yield()` that generates compiled entry, and implicitly uses the defaults for other ones (interpreter, exception). We should be more explicit about these, and verify the generators properly initialized all out-parameters. > > I think we are only using interpreter/exception entry in `enterContinuation`, but not in `yield`. Notably, `exception_offset` should be `-1` for `nmethod::new_native_nmethod` to treat it as "no exception handlers". > > There a many ways to strengthen this, this PR is the one I like. I can do the symmetric change in aarch64, once we are agree on x86_64 version. > > Additional testing: > - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` > - [x] Linux x86_64 fastdebug, `tier1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Also handle AArch64 parts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10241/files - new: https://git.openjdk.org/jdk/pull/10241/files/f187415f..e0b244a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10241&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10241&range=01-02 Stats: 24 lines in 1 file changed: 15 ins; 5 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10241.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10241/head:pull/10241 PR: https://git.openjdk.org/jdk/pull/10241 From shade at openjdk.org Tue Sep 13 07:56:51 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Sep 2022 07:56:51 GMT Subject: RFR: 8293654: Improve SharedRuntime handling of continuation helper out-arguments [v2] In-Reply-To: <5rNmaj2HHOoAg6W9_6iQ-_IN2hNGROcOLPciNec8qgg=.cb10bc31-cd46-4c8c-b9b3-e7e43c1d9696@github.com> References: <5rNmaj2HHOoAg6W9_6iQ-_IN2hNGROcOLPciNec8qgg=.cb10bc31-cd46-4c8c-b9b3-e7e43c1d9696@github.com> Message-ID: <-o_l_DVshf2x-fmueL5yLya3vmeaKbHe5BT1h9KkDCA=.62a3075e-acb7-4cbe-ae66-14065e5da920@github.com> On Tue, 13 Sep 2022 07:15:50 GMT, Aleksey Shipilev wrote: >> (Found this while adapting current mainline to x86_32 port) >> >> After [JDK-8292584](https://bugs.openjdk.org/browse/JDK-8292584), we have `gen_continuation_yield()` that generates compiled entry, and implicitly uses the defaults for other ones (interpreter, exception). We should be more explicit about these, and verify the generators properly initialized all out-parameters. >> >> I think we are only using interpreter/exception entry in `enterContinuation`, but not in `yield`. Notably, `exception_offset` should be `-1` for `nmethod::new_native_nmethod` to treat it as "no exception handlers". >> >> There a many ways to strengthen this, this PR is the one I like. I can do the symmetric change in aarch64, once we are agree on x86_64 version. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` >> - [x] Linux x86_64 fastdebug, `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Wrap in ASSERT As promised, also added the symmetric AArch64 part. `hotspot_loom jdk_loom` passes there, running `tier1` now. ------------- PR: https://git.openjdk.org/jdk/pull/10241 From omikhaltcova at openjdk.org Tue Sep 13 10:32:45 2022 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Tue, 13 Sep 2022 10:32:45 GMT Subject: RFR: 8262901: [macos_aarch64] NativeCallTest expected:<-3.8194101E18> but was:<3.02668882E10> In-Reply-To: References: Message-ID: On Mon, 12 Sep 2022 16:54:38 GMT, Andrew Haley wrote: >> This PR is opened as a follow-up for [1] and included the "must-done" fixes pointed by @teshull. >> >> This patch for JVMCI includes the following fixes related to the macOS AArch64 calling convention: >> 1. arguments may consume slots on the stack that are not multiples of 8 bytes [2] >> 2. natural alignment of stack arguments [2] >> 3. stack must remain 16-byte aligned [3][4] >> >> Tested with tier1 on macOS AArch64 and Linux AArch64. >> >> [1] https://github.com/openjdk/jdk/pull/6641 >> [2] https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms >> [3] https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-160#stack >> [4] https://docs.microsoft.com/en-us/cpp/build/stack-usage?view=msvc-170 > > src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot.aarch64/src/jdk/vm/ci/hotspot/aarch64/AArch64HotSpotRegisterConfig.java line 291: > >> 289: currentStackOffset += Math.max(valueKind.getPlatformKind().getSizeInBytes(), target.wordSize); >> 290: } >> 291: } > > So I'm curious: why not subclass `AArch64HotSpotRegisterConfig` here, or maybe even use an interface, rather than the boolean? I tried to be closer to the original review https://github.com/openjdk/jdk/pull/6641 that requires only 2 fixes and tried to do only this in order to continue easily. Could you clarify please what boolean you talk about? `private final boolean macOS;` that was pushed into `class AArch64HotSpotRegisterConfig`, right? I'm hesitating a bit because of the highlighted code. ------------- PR: https://git.openjdk.org/jdk/pull/10238 From roland at openjdk.org Tue Sep 13 12:45:49 2022 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 13 Sep 2022 12:45:49 GMT Subject: RFR: 8288180: C2: VectorPhase must ensure that SafePointNode memory input is a MergeMemNode In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 09:08:50 GMT, Emanuel Peter wrote: > **Context:** > The `GraphKit` seems to assume that the memory input of the map (`SafePointNode`) is always a `MergeMemNode`. It requires this so that it can easily access the memory slices. > > **Analysis:** > However, the `VectorPhase` also generates some `GraphKit` instances, for example in `PhaseVector::expand_vbox_alloc_node`. But at that point we are not in parsing, and the `SafePointNode` might have a folded memory state (not `MergeMemNode`). The assert in `GrahpKit::merged_memory` can thus be triggered. > > In this particular failure case, the `SafePointNode` was constructed/initialized with memory as a memory-phi, which was the result of a previous `GraphKit::reset_memory` call, which in turn folded the memory (the `MergeMemNode` had only one input, the memory-phi). This on its own does not necessarily trigger our assert. In many cases, the new `GraphKit` first transforms the memory input and calls `GraphKit::set_all_memory`, which makes sure there is a `MergeMemNode`. But in our failure case, `GraphKit::set_all_memory` is never called before we call `GraphKit::merged_memory`. > > **Side-Note:** > The flag (`StressReflectiveCode`) was relevant because it disabled `GraphKit::get_layout_helper` from taking a constant layout helper for `T_LONG`, and instead it had to create a load (which then called `GraphKit::merged_memory`). > > **Suggested Solution:** > `VectorPhase` must ensure that the map's memory input is `MergeMemNode`. We can do this in `clone_jvms`, which is called before we instanciate the `GraphKit`. > > I added a regression test, which fails without the fix, and passes with it. > Ran regression tests, passed. Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.org/jdk/pull/10215 From roland at openjdk.org Tue Sep 13 12:47:44 2022 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 13 Sep 2022 12:47:44 GMT Subject: RFR: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv [v3] In-Reply-To: References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> Message-ID: On Tue, 13 Sep 2022 02:50:27 GMT, Pengfei Li wrote: >> The: >> if (scale == min_signed_integer(exp_bt)) { >> ? >> (It's` from JDK-8259609) >> The problem I think is for the expression: -min_jint * i >> scale here is min_jint initially, stored in a long. It's then multiplied by -1. -min_jint = min_jint when stored in an int but not in a long. When scale is later transformed from a long to an int, some code finds that -(long)min_jint can't be stored in an int. > > Thanks for explanation. In my understanding, `min_jint` is also a special point where bailing out is required. I should update the condition of `scale_sum < min_signed_integer(exp_bt)` to `scale_sum <= min_signed_integer(exp_bt)`, right? Yes, you must be right. ------------- PR: https://git.openjdk.org/jdk/pull/9851 From roland at openjdk.org Tue Sep 13 12:59:34 2022 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 13 Sep 2022 12:59:34 GMT Subject: RFR: 8255670: Improve C2's detection of modified nodes In-Reply-To: References: Message-ID: On Sat, 9 Jul 2022 08:37:24 GMT, Emanuel Peter wrote: > Added `record_modified_node` to: > > Node::clone > Node::add_req > Node::add_req_batch > Node::ins_req > Node::add_prec > Node::rm_prec > Node::set_prec > > > Added `igvn->_worklist.push(node)` in various places that modified a `node` but did not add it to the igvn worklist. > > 7 times I had to push `Root`, 5 of these it was because of the creation of a `HaltNode`, which means we have a `root->add_req(halt)`. > > In one case we have a MergeMemStream node, which gets two MergeMem nodes as input, and streams over them. > Unfortunately, it modifies one of the two, which then can trigger our assertion code. I now push this node to the igvn worklist, but a better fix would be to make MergeMemStream leave the MergeMem nodes unmodified. I think that should be possible, filed an RFE [JDK-8293358](https://bugs.openjdk.org/browse/JDK-8293358) > > FYI: > What I am NOT doing here, and leave to a future RFE/independent change: investigate / implement these assertions for late/incremental inlining. > > Ran larger regression tests, and 7-9h of fuzzing on 3 platforms. src/hotspot/share/opto/loopnode.cpp line 5135: > 5133: set_loop(halt, l); > 5134: C->root()->add_req(halt); > 5135: _igvn._worklist.push(C->root()); Maybe add a method to PhaseIterGVN that does the add_req + push similar to replace_input_of? ------------- PR: https://git.openjdk.org/jdk/pull/9439 From thartmann at openjdk.org Tue Sep 13 13:00:48 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 13 Sep 2022 13:00:48 GMT Subject: RFR: 8288180: C2: VectorPhase must ensure that SafePointNode memory input is a MergeMemNode In-Reply-To: References: Message-ID: <0ETzUsvMOGpZomQ6R0P0GJ9n1N_YTdCLiJuKAzf0ThA=.ecb15957-34a6-4fbf-b02d-1e45b7a1c7f8@github.com> On Thu, 8 Sep 2022 09:08:50 GMT, Emanuel Peter wrote: > **Context:** > The `GraphKit` seems to assume that the memory input of the map (`SafePointNode`) is always a `MergeMemNode`. It requires this so that it can easily access the memory slices. > > **Analysis:** > However, the `VectorPhase` also generates some `GraphKit` instances, for example in `PhaseVector::expand_vbox_alloc_node`. But at that point we are not in parsing, and the `SafePointNode` might have a folded memory state (not `MergeMemNode`). The assert in `GrahpKit::merged_memory` can thus be triggered. > > In this particular failure case, the `SafePointNode` was constructed/initialized with memory as a memory-phi, which was the result of a previous `GraphKit::reset_memory` call, which in turn folded the memory (the `MergeMemNode` had only one input, the memory-phi). This on its own does not necessarily trigger our assert. In many cases, the new `GraphKit` first transforms the memory input and calls `GraphKit::set_all_memory`, which makes sure there is a `MergeMemNode`. But in our failure case, `GraphKit::set_all_memory` is never called before we call `GraphKit::merged_memory`. > > **Side-Note:** > The flag (`StressReflectiveCode`) was relevant because it disabled `GraphKit::get_layout_helper` from taking a constant layout helper for `T_LONG`, and instead it had to create a load (which then called `GraphKit::merged_memory`). > > **Suggested Solution:** > `VectorPhase` must ensure that the map's memory input is `MergeMemNode`. We can do this in `clone_jvms`, which is called before we instanciate the `GraphKit`. > > I added a regression test, which fails without the fix, and passes with it. > Ran regression tests, passed. Nice analysis. Looks good to me otherwise. src/hotspot/share/opto/vector.cpp line 168: > 166: Node* mem = map->memory(); > 167: if (!mem->is_MergeMem()) { > 168: // Since we are not in parsing, the SafeNode does not guarantee that the memory Suggestion: // Since we are not in parsing, the SafePointNode does not guarantee that the memory src/hotspot/share/opto/vector.cpp line 170: > 168: // Since we are not in parsing, the SafeNode does not guarantee that the memory > 169: // input is necessarily a MergeMemNode. But we need to ensure that there is that > 170: // MereMemNode, since the GraphKit assumes the memory input of the map to be a Suggestion: // MergeMemNode, since the GraphKit assumes the memory input of the map to be a ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/10215 From epeter at openjdk.org Tue Sep 13 13:12:37 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 Sep 2022 13:12:37 GMT Subject: RFR: 8288180: C2: VectorPhase must ensure that SafePointNode memory input is a MergeMemNode [v2] In-Reply-To: References: Message-ID: <8_5DsQxnxHkbfQ_mqynfnXojB76wtt4860p63S_2wT0=.44281fe8-27bb-4a12-8e99-d0a7218cbb7e@github.com> > **Context:** > The `GraphKit` seems to assume that the memory input of the map (`SafePointNode`) is always a `MergeMemNode`. It requires this so that it can easily access the memory slices. > > **Analysis:** > However, the `VectorPhase` also generates some `GraphKit` instances, for example in `PhaseVector::expand_vbox_alloc_node`. But at that point we are not in parsing, and the `SafePointNode` might have a folded memory state (not `MergeMemNode`). The assert in `GrahpKit::merged_memory` can thus be triggered. > > In this particular failure case, the `SafePointNode` was constructed/initialized with memory as a memory-phi, which was the result of a previous `GraphKit::reset_memory` call, which in turn folded the memory (the `MergeMemNode` had only one input, the memory-phi). This on its own does not necessarily trigger our assert. In many cases, the new `GraphKit` first transforms the memory input and calls `GraphKit::set_all_memory`, which makes sure there is a `MergeMemNode`. But in our failure case, `GraphKit::set_all_memory` is never called before we call `GraphKit::merged_memory`. > > **Side-Note:** > The flag (`StressReflectiveCode`) was relevant because it disabled `GraphKit::get_layout_helper` from taking a constant layout helper for `T_LONG`, and instead it had to create a load (which then called `GraphKit::merged_memory`). > > **Suggested Solution:** > `VectorPhase` must ensure that the map's memory input is `MergeMemNode`. We can do this in `clone_jvms`, which is called before we instanciate the `GraphKit`. > > I added a regression test, which fails without the fix, and passes with it. > Ran regression tests, passed. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Updated comments according to Tobias' review suggestions Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10215/files - new: https://git.openjdk.org/jdk/pull/10215/files/5fcc329a..0d58d085 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10215&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10215&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10215.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10215/head:pull/10215 PR: https://git.openjdk.org/jdk/pull/10215 From epeter at openjdk.org Tue Sep 13 13:15:17 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 Sep 2022 13:15:17 GMT Subject: RFR: 8288180: C2: VectorPhase must ensure that SafePointNode memory input is a MergeMemNode [v2] In-Reply-To: References: Message-ID: On Tue, 13 Sep 2022 12:41:55 GMT, Roland Westrelin wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Updated comments according to Tobias' review suggestions >> >> Co-authored-by: Tobias Hartmann > > Looks good to me. Thanks @rwestrel and @TobiHartmann for the review and help! ------------- PR: https://git.openjdk.org/jdk/pull/10215 From epeter at openjdk.org Tue Sep 13 13:18:09 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 Sep 2022 13:18:09 GMT Subject: Integrated: 8288180: C2: VectorPhase must ensure that SafePointNode memory input is a MergeMemNode In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 09:08:50 GMT, Emanuel Peter wrote: > **Context:** > The `GraphKit` seems to assume that the memory input of the map (`SafePointNode`) is always a `MergeMemNode`. It requires this so that it can easily access the memory slices. > > **Analysis:** > However, the `VectorPhase` also generates some `GraphKit` instances, for example in `PhaseVector::expand_vbox_alloc_node`. But at that point we are not in parsing, and the `SafePointNode` might have a folded memory state (not `MergeMemNode`). The assert in `GrahpKit::merged_memory` can thus be triggered. > > In this particular failure case, the `SafePointNode` was constructed/initialized with memory as a memory-phi, which was the result of a previous `GraphKit::reset_memory` call, which in turn folded the memory (the `MergeMemNode` had only one input, the memory-phi). This on its own does not necessarily trigger our assert. In many cases, the new `GraphKit` first transforms the memory input and calls `GraphKit::set_all_memory`, which makes sure there is a `MergeMemNode`. But in our failure case, `GraphKit::set_all_memory` is never called before we call `GraphKit::merged_memory`. > > **Side-Note:** > The flag (`StressReflectiveCode`) was relevant because it disabled `GraphKit::get_layout_helper` from taking a constant layout helper for `T_LONG`, and instead it had to create a load (which then called `GraphKit::merged_memory`). > > **Suggested Solution:** > `VectorPhase` must ensure that the map's memory input is `MergeMemNode`. We can do this in `clone_jvms`, which is called before we instanciate the `GraphKit`. > > I added a regression test, which fails without the fix, and passes with it. > Ran regression tests, passed. This pull request has now been integrated. Changeset: 6f2223fa Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/6f2223faa170a800f76a54a6637c160eadab6232 Stats: 21 lines in 2 files changed: 21 ins; 0 del; 0 mod 8288180: C2: VectorPhase must ensure that SafePointNode memory input is a MergeMemNode Reviewed-by: roland, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/10215 From pli at openjdk.org Tue Sep 13 14:31:55 2022 From: pli at openjdk.org (Pengfei Li) Date: Tue, 13 Sep 2022 14:31:55 GMT Subject: RFR: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv [v4] In-Reply-To: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> Message-ID: <63VQjfbVlP0EB_CjYS1LZahu0cpiS4Zjp7-Gbg49LDw=.316e4cba-4fd7-4061-8e9c-9d81e78d73bb@github.com> > This is a REDO of JDK-8289996. In previous patch, we defer some strength > reductions in Ideal functions of `Mul[I|L]Node` to post loop igvn phase > to fix a range check hoisting issue. More about previous patch can be > found in PR #9508, where we have described some details of the issue > we would like to fix. > > Previous patch was backed out due to some jtreg failures found. We have > analyzed those failures one by one and found one of them exposes a real > performance regression. We see that deferring some strength reductions > to post loop igvn phase has too much impact. Some vector multiplication > will not be optimized to vector addition with vector shift after that > change. So in this REDO we propose the range check hoisting fix with a > different approach. > > In this new patch, we add some recursive pattern matches for scaled loop > iv in function `PhaseIdealLoop::is_scaled_iv()`. These include matching > a sum or a difference of two scaled iv expressions. With this, all kinds > of Ideal-transformed scaled iv expressions can still be recognized. This > new approach only touches loop transformation code and hence has much > smaller impact. We have verified that this new approach applies to both > int range checks and long range checks. > > Previously attached jtreg case fails on ppc64 because VectorAPI has no > vector intrinsics on ppc64 so there's no long range check to hoist. In > this patch, we limit the test architecture to x64 and AArch64. > > Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1. Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: Bail out when scale value is min_jint ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9851/files - new: https://git.openjdk.org/jdk/pull/9851/files/02402795..1cf66670 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9851&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9851&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9851.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9851/head:pull/9851 PR: https://git.openjdk.org/jdk/pull/9851 From pli at openjdk.org Tue Sep 13 14:31:57 2022 From: pli at openjdk.org (Pengfei Li) Date: Tue, 13 Sep 2022 14:31:57 GMT Subject: RFR: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv [v3] In-Reply-To: References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> Message-ID: On Tue, 13 Sep 2022 12:43:59 GMT, Roland Westrelin wrote: >> Thanks for explanation. In my understanding, `min_jint` is also a special point where bailing out is required. I should update the condition of `scale_sum < min_signed_integer(exp_bt)` to `scale_sum <= min_signed_integer(exp_bt)`, right? > > Yes, you must be right. Thanks, I have updated this. ------------- PR: https://git.openjdk.org/jdk/pull/9851 From kvn at openjdk.org Tue Sep 13 15:51:41 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 13 Sep 2022 15:51:41 GMT Subject: RFR: 8293618: x86: Wrong code generation in class Assembler In-Reply-To: References: Message-ID: <_kM8UYL9zISDQHbZvarO7JcppFHv07wJT-1nUS3YN2o=.24570e77-b958-434a-b683-21cfcddc7777@github.com> On Mon, 12 Sep 2022 15:30:01 GMT, Quan Anh Mai wrote: > Hi, > > This patch fixes some issues in the code generation of x86 assembler: > > - `Assembler::testl` misses `prefix(dst)` > - `Assembler::addw` misses the 0x66 prefix > - `Assembler::emit_operand` needs the length of the instruction from the address operand, this is often forgotten, making this parameter explicit to prevent potential issues > - The assembler should not do optimisations that change the actual emitted instructions, these should be moved to `MacroAssembler` instead > > AFAICT there is no failure due to these mistakes. Please take a look and give reviews. > Thanks you very much. Seems reasonable. I will test it. ------------- PR: https://git.openjdk.org/jdk/pull/10240 From duke at openjdk.org Tue Sep 13 16:12:25 2022 From: duke at openjdk.org (Dhamoder Nalla) Date: Tue, 13 Sep 2022 16:12:25 GMT Subject: RFR: 8276545: Fix handling of trap count overflow in Parse::Parse() Message-ID: The API trap_count(reason) returns (uint)-1 == 0xFFFFFFFF in case of trap count overflow, trap_count_limit()) returns (jubyte)-1 == 0xFF which leads to the failure of overflow check if (md_count == md->trap_count_limit()) (which is 0xFFFFFFFF == 0xFF). uint md_count = md->trap_count(reason); if (md_count != 0) { **if (md_count == md->trap_count_limit())** // Trap count is overflown Trap count value is computed as 0xFFFFFFFF + overflowcount (diff after 0xFF) which is wrong. md_count += md->overflow_trap_count(); Fix: Overflow check should be either of below if (md_count >= md->trap_count_limit()) or if (md_count == (uint)-1) Total trap count as md_count = md->trap_count_limit() + md->overflow_trap_count(); Test: local JTReg test for hotspot_all group. ------------- Commit messages: - Fix handling of trap count overflow in Parse::Parse() Changes: https://git.openjdk.org/jdk/pull/10187/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10187&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8276545 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10187.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10187/head:pull/10187 PR: https://git.openjdk.org/jdk/pull/10187 From avoitylov at openjdk.org Tue Sep 13 16:12:57 2022 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Tue, 13 Sep 2022 16:12:57 GMT Subject: RFR: 8293695: Implement isInfinite intrinsic for RISC-V Message-ID: <8-TCWhiO_DkKS10Fkl7gztIBGeGJHO45F2d99YVKAvQ=.8e10e678-d320-4945-8760-ad48eefbdd88@github.com> RISC-V 64 intrinsic for isInfinite follows the logic of x86 intrinsic (introduced by 8285868). This patch adds C2 match for IsInfinite nodes. Existing test is modified to run on RISC-V and passes on both release and fastdebug builds. Benchmark results are below: before: Benchmark Mode Cnt Score Error Units DoubleClassCheck.testIsInfiniteBranch avgt 15 43.547 ? 6.843 ns/op DoubleClassCheck.testIsInfiniteCMov avgt 15 16.301 ? 1.386 ns/op DoubleClassCheck.testIsInfiniteStore avgt 15 16.230 ? 1.477 ns/op FloatClassCheck.testIsInfiniteBranch avgt 15 38.774 ? 3.572 ns/op FloatClassCheck.testIsInfiniteCMov avgt 15 15.064 ? 1.310 ns/op FloatClassCheck.testIsInfiniteStore avgt 15 14.967 ? 1.298 ns/op after: Benchmark Mode Cnt Score Error Units DoubleClassCheck.testIsInfiniteBranch avgt 15 39.987 ? 6.179 ns/op DoubleClassCheck.testIsInfiniteCMov avgt 15 13.477 ? 1.159 ns/op DoubleClassCheck.testIsInfiniteStore avgt 15 9.607 ? 0.834 ns/op FloatClassCheck.testIsInfiniteBranch avgt 15 36.265 ? 3.168 ns/op FloatClassCheck.testIsInfiniteCMov avgt 15 13.230 ? 1.100 ns/op FloatClassCheck.testIsInfiniteStore avgt 15 9.492 ? 0.807 ns/op According to 8285868 discussion, isNaN and isFinite methods intrinsification using the same approach might be not beneficial. I'm going to investigate it for RISC-V and propose methods intrinsification as part of further work in case it's profitable. ------------- Commit messages: - JDK-8293695 implementation Changes: https://git.openjdk.org/jdk/pull/10253/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10253&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293695 Stats: 30 lines in 3 files changed: 26 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10253.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10253/head:pull/10253 PR: https://git.openjdk.org/jdk/pull/10253 From xliu at openjdk.org Tue Sep 13 17:38:49 2022 From: xliu at openjdk.org (Xin Liu) Date: Tue, 13 Sep 2022 17:38:49 GMT Subject: Withdrawn: 8287385: Suppress superficial unstable_if traps In-Reply-To: References: Message-ID: On Thu, 21 Jul 2022 19:54:11 GMT, Xin Liu wrote: > An unstable if trap is **superficial** if it can NOT prune any code. Sometimes, the else-section of program is empty. The superficial unstable_if traps not only complicate code shape but also consume codecache. C2 has to generate debuginfo for them. If the condition changed, HotSpot has to destroy the established nmethod and compile it again. Our analysis shows that rough 20% unstable_if traps are superficial. > > The algorithm which can identify and suppress superficial unstable if traps derives from its definition. A non-superficial unstable_if trap must prune some code. Parser skips parsing dead basic blocks(BBs). A trap is superficial if and only if its target BB is not dead! Or, it will be skipped(contradict from definition). As a result, we can suppress an unstable_if trap when c2 parse the target BB. This algorithm leaves alone those uncommon_traps do prune code. > > For example, C2 generates an uncommon_trap for the else if cond is very likely true. > > public static int foo(boolean cond, int i) { > Value x = new Value(0); > Value y = new Value(1); > Value z = new Value(i); > > if (cond) { > i++; > } > return x._value + y._value + z._value + i; > } > > > If we suppress this superficial unstable_if, the nmethod reduces from 608 bytes to 520 bytes, or -14.5%. Most of them come from "scopes data/pcs". It's because superficial unstable_if generates a trap like this > > 037 call,static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0') > # SuperficialIfTrap::foo @ bci:29 (line 32) L[0]=_ L[1]=rsp + #4 L[2]=#ScObj0 L[3]=#ScObj1 L[4]=#ScObj2 STK[0]=rsp + #0 > # ScObj0 SuperficialIfTrap$Value={ [_value :0]=#0 } > # ScObj1 SuperficialIfTrap$Value={ [_value :0]=#1 } > # ScObj2 SuperficialIfTrap$Value={ [_value :0]=rsp + #4 } > # OopMap {off=60/0x3c} > 03c stop # ShouldNotReachHere > > > Here is the breakdown of nmethod, generated by '-XX:+PrintAssembly' > > <-XX:-OptimizeUnstableIf> > Compiled method (c2) 346 17 4 SuperficialIfTrap::foo (53 bytes) > total in heap [0x00007f50f4970910,0x00007f50f4970b70] = 608 > relocation [0x00007f50f4970a70,0x00007f50f4970a80] = 16 > main code [0x00007f50f4970a80,0x00007f50f4970ad8] = 88 > stub code [0x00007f50f4970ad8,0x00007f50f4970af0] = 24 > oops [0x00007f50f4970af0,0x00007f50f4970b00] = 16 > metadata [0x00007f50f4970b00,0x00007f50f4970b08] = 8 > scopes data [0x00007f50f4970b08,0x00007f50f4970b38] = 48 > scopes pcs [0x00007f50f4970b38,0x00007f50f4970b68] = 48 > dependencies [0x00007f50f4970b68,0x00007f50f4970b70] = 8 > > <-XX:+OptimizeUnstableIf> > Compiled method (c2) 309 17 4 SuperficialIfTrap::foo (53 bytes) > total in heap [0x00007f4090970910,0x00007f4090970b18] = 520 > relocation [0x00007f4090970a70,0x00007f4090970a80] = 16 > main code [0x00007f4090970a80,0x00007f4090970ac8] = 72 > stub code [0x00007f4090970ac8,0x00007f4090970ae0] = 24 > oops [0x00007f4090970ae0,0x00007f4090970ae8] = 8 > scopes data [0x00007f4090970ae8,0x00007f4090970af0] = 8 > scopes pcs [0x00007f4090970af0,0x00007f4090970b10] = 32 > dependencies [0x00007f4090970b10,0x00007f4090970b18] = 8 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/9601 From shade at openjdk.org Tue Sep 13 18:31:43 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 Sep 2022 18:31:43 GMT Subject: RFR: 8293654: Improve SharedRuntime handling of continuation helper out-arguments [v3] In-Reply-To: References: Message-ID: On Tue, 13 Sep 2022 07:56:49 GMT, Aleksey Shipilev wrote: >> (Found this while adapting current mainline to x86_32 port) >> >> After [JDK-8292584](https://bugs.openjdk.org/browse/JDK-8292584), we have `gen_continuation_yield()` that generates compiled entry, and implicitly uses the defaults for other ones (interpreter, exception). We should be more explicit about these, and verify the generators properly initialized all out-parameters. >> >> I think we are only using interpreter/exception entry in `enterContinuation`, but not in `yield`. Notably, `exception_offset` should be `-1` for `nmethod::new_native_nmethod` to treat it as "no exception handlers". >> >> There a many ways to strengthen this, this PR is the one I like. I can do the symmetric change in aarch64, once we are agree on x86_64 version. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` >> - [x] Linux x86_64 fastdebug, `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Also handle AArch64 parts I think this one is pretty simple, but any other Reviews? ------------- PR: https://git.openjdk.org/jdk/pull/10241 From sviswanathan at openjdk.org Tue Sep 13 18:59:43 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 13 Sep 2022 18:59:43 GMT Subject: RFR: 8290169: adlc: Improve child constraints for vector unary operations [v2] In-Reply-To: References: Message-ID: On Thu, 18 Aug 2022 03:27:55 GMT, Hao Sun wrote: >> As demonstrated in [1], the child constrait generated for *predicated >> vector unary operation* is the super set of that generated for the >> *unpredicated* version. As a result, there exists a risk for predicated >> vector unary operaions to match the unpredicated rules by accident. >> >> In this patch, we resolve this issue by generating one extra check >> "rChild == NULL" ONLY for vector unary operations. In this way, the >> child constraints for predicated/unpredicated vector unary operations >> are exclusive now. >> >> Following the example in [1], the dfa state generated for AbsVI is shown >> below. >> >> >> void State::_sub_Op_AbsVI(const Node *n){ >> if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && >> ( UseSVE > 0 ) ) >> { >> unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; >> DFA_PRODUCTION(VREG, vabsI_masked_rule, c) >> } >> if( STATE__VALID_CHILD(_kids[0], VREG) && _kids[1] == NULL && <---- 1 >> ( UseSVE > 0) ) >> { >> unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; >> if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { >> DFA_PRODUCTION(VREG, vabsI_rule, c) >> } >> } >> ... >> >> >> We can see that the constraint at line 1 cannot be matched for >> predicated AbsVI node now. >> >> The main updates are made in adlc/dfa part. Ideally, we should only >> add the extra check for affected platforms, i.e. AVX-512 and SVE. But we >> didn't do that because it would be better not to introduce any >> architecture dependent implementation here. >> >> Besides, workarounds in both ~aarch64_sve.ad~aarch64_vector.ad and x86.ad are removed. 1) >> Many "is_predicated_vector()" checks can be removed in ~aarch64_sve.ad~aarch64_vector.ad >> file. 2) Default instruction cost is used for involving rules in x86.ad >> file. >> >> ~[1]. https://github.com/shqking/jdk/commit/50ec9b19~ >> [1]. https://github.com/shqking/jdk/commit/f7d9621e2 > > Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge branch 'master' into 8290169-adlc > > Resolve the conflicts. > - 8290169: adlc: Improve child constraints for vector unary operations > > As demonstrated in [1], the child constrait generated for *predicated > vector unary operation* is the super set of that generated for the > *unpredicated* version. As a result, there exists a risk for predicated > vector unary operaions to match the unpredicated rules by accident. > > In this patch, we resolve this issue by generating one extra check > "rChild == NULL" ONLY for vector unary operations. In this way, the > child constraints for predicated/unpredicated vector unary operations > are exclusive now. > > Following the example in [1], the dfa state generated for AbsVI is shown > below. > > ``` > void State::_sub_Op_AbsVI(const Node *n){ > if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && > ( UseSVE > 0 ) ) > { > unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; > DFA_PRODUCTION(VREG, vabsI_masked_rule, c) > } > if( STATE__VALID_CHILD(_kids[0], VREG) && _kids[1] == NULL && <---- 1 > ( UseSVE > 0) ) > { > unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; > if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { > DFA_PRODUCTION(VREG, vabsI_rule, c) > } > } > ... > ``` > > We can see that the constraint at line 1 cannot be matched for > predicated AbsVI node now. > > The main updates are made in adlc/dfa part. Ideally, we should only > add the extra check for affected platforms, i.e. AVX-512 and SVE. But we > didn't do that because it would be better not to introduce any > architecture dependent implementation here. > > Besides, workarounds in both aarch64_sve.ad and x86.ad are removed. 1) > Many "is_predicated_vector()" checks can be removed in aarch64_sve.ad > file. 2) Default instruction cost is used for involving rules in x86.ad > file. > > [1]. https://github.com/shqking/jdk/commit/50ec9b19 Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR: https://git.openjdk.org/jdk/pull/9534 From kvn at openjdk.org Tue Sep 13 20:28:42 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 13 Sep 2022 20:28:42 GMT Subject: RFR: 8293654: Improve SharedRuntime handling of continuation helper out-arguments [v3] In-Reply-To: References: Message-ID: On Tue, 13 Sep 2022 07:56:49 GMT, Aleksey Shipilev wrote: >> (Found this while adapting current mainline to x86_32 port) >> >> After [JDK-8292584](https://bugs.openjdk.org/browse/JDK-8292584), we have `gen_continuation_yield()` that generates compiled entry, and implicitly uses the defaults for other ones (interpreter, exception). We should be more explicit about these, and verify the generators properly initialized all out-parameters. >> >> I think we are only using interpreter/exception entry in `enterContinuation`, but not in `yield`. Notably, `exception_offset` should be `-1` for `nmethod::new_native_nmethod` to treat it as "no exception handlers". >> >> There a many ways to strengthen this, this PR is the one I like. I can do the symmetric change in aarch64, once we are agree on x86_64 version. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` >> - [x] Linux x86_64 fastdebug, `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Also handle AArch64 parts Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10241 From sviswanathan at openjdk.org Tue Sep 13 22:46:43 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 13 Sep 2022 22:46:43 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v2] In-Reply-To: References: Message-ID: On Tue, 9 Aug 2022 13:18:05 GMT, Quan Anh Mai wrote: >> The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. >> >> Testing: Manually run the test case in the JBS and look at the compiled code. >> >> I also do some small clean-ups in x86_64.ad: >> >> - For some reasons, `incl(Address)` is less efficient than `addl(Address, int)` as the former results in 3 uops in the fused domain as opposed to 2 in cases of the latter (according to [uops.info](uops.info)). As a result, I propose to remove the corresponding rules. >> - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. >> - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. >> - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed >> - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. >> - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. >> >> Please kindly review, thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add benchmark I don't think we should replace the inc/dec by add. On my desktop, I see the following: Before: Benchmark Mode Cnt Score Error Units BasicRules.add_mem_con avgt 3 132.268 ? 0.599 ns/op BasicRules.inc_mem avgt 3 169.980 ? 0.617 ns/op After: Benchmark Mode Cnt Score Error Units BasicRules.add_mem_con avgt 3 117.426 ? 0.128 ns/op BasicRules.inc_mem avgt 3 182.907 ? 0.277 ns/op The inc_mem jmh performance is worse after the patch. There is already UseIncDec option which is set appropriately to select whether to generate inc/dec or the add/sub instruction. ------------- PR: https://git.openjdk.org/jdk/pull/9791 From kvn at openjdk.org Wed Sep 14 02:20:28 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 14 Sep 2022 02:20:28 GMT Subject: RFR: 8293618: x86: Wrong code generation in class Assembler In-Reply-To: References: Message-ID: On Mon, 12 Sep 2022 15:30:01 GMT, Quan Anh Mai wrote: > Hi, > > This patch fixes some issues in the code generation of x86 assembler: > > - `Assembler::testl` misses `prefix(dst)` > - `Assembler::addw` misses the 0x66 prefix > - `Assembler::emit_operand` needs the length of the instruction from the address operand, this is often forgotten, making this parameter explicit to prevent potential issues > - The assembler should not do optimisations that change the actual emitted instructions, these should be moved to `MacroAssembler` instead > > AFAICT there is no failure due to these mistakes. Please take a look and give reviews. > Thanks you very much. Testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10240 From kvn at openjdk.org Wed Sep 14 02:45:39 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 14 Sep 2022 02:45:39 GMT Subject: RFR: 8255670: Improve C2's detection of modified nodes In-Reply-To: References: Message-ID: On Sat, 9 Jul 2022 08:37:24 GMT, Emanuel Peter wrote: > Added `record_modified_node` to: > > Node::clone > Node::add_req > Node::add_req_batch > Node::ins_req > Node::add_prec > Node::rm_prec > Node::set_prec > > > Added `igvn->_worklist.push(node)` in various places that modified a `node` but did not add it to the igvn worklist. > > 7 times I had to push `Root`, 5 of these it was because of the creation of a `HaltNode`, which means we have a `root->add_req(halt)`. > > In one case we have a MergeMemStream node, which gets two MergeMem nodes as input, and streams over them. > Unfortunately, it modifies one of the two, which then can trigger our assertion code. I now push this node to the igvn worklist, but a better fix would be to make MergeMemStream leave the MergeMem nodes unmodified. I think that should be possible, filed an RFE [JDK-8293358](https://bugs.openjdk.org/browse/JDK-8293358) > > FYI: > What I am NOT doing here, and leave to a future RFE/independent change: investigate / implement these assertions for late/incremental inlining. > > Ran larger regression tests, and 7-9h of fuzzing on 3 platforms. The only GVN transformation we do with Root node is removing TOP inputs in `RootNode::Ideal()`. That is why we are "sloppy" about putting it on worklist when we know that added input is not TOP (new Halt node in this case). ------------- PR: https://git.openjdk.org/jdk/pull/9439 From duke at openjdk.org Wed Sep 14 02:48:40 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Wed, 14 Sep 2022 02:48:40 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v3] In-Reply-To: References: Message-ID: > The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. > > Testing: Manually run the test case in the JBS and look at the compiled code. > > I also do some small clean-ups in x86_64.ad: > > - For some reasons, `incl(Address)` is less efficient than `addl(Address, int)` as the former results in 3 uops in the fused domain as opposed to 2 in cases of the latter (according to [uops.info](uops.info)). As a result, I propose to remove the corresponding rules. > - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. > - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. > - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed > - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. > - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. > > Please kindly review, thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: revert removing inc_mem ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9791/files - new: https://git.openjdk.org/jdk/pull/9791/files/413feb31..e7c79d4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9791&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9791&range=01-02 Stats: 58 lines in 1 file changed: 58 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9791.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9791/head:pull/9791 PR: https://git.openjdk.org/jdk/pull/9791 From duke at openjdk.org Wed Sep 14 02:52:42 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Wed, 14 Sep 2022 02:52:42 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v2] In-Reply-To: References: Message-ID: On Tue, 13 Sep 2022 22:44:36 GMT, Sandhya Viswanathan wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add benchmark > > I don't think we should replace the inc/dec by add. > > On my desktop, I see the following: > Before: > Benchmark Mode Cnt Score Error Units > BasicRules.add_mem_con avgt 3 132.268 ? 0.599 ns/op > BasicRules.inc_mem avgt 3 169.980 ? 0.617 ns/op > > After: > Benchmark Mode Cnt Score Error Units > BasicRules.add_mem_con avgt 3 117.426 ? 0.128 ns/op > BasicRules.inc_mem avgt 3 182.907 ? 0.277 ns/op > > The inc_mem jmh performance is worse after the patch. > > There is already UseIncDec option which is set appropriately to select whether to generate inc/dec or the add/sub instruction. @sviswa7 Thanks a lot for your review, I have reverted that change. I don't understand why, though, it does not seem that the bottleneck is in the predecoder. ------------- PR: https://git.openjdk.org/jdk/pull/9791 From yadongwang at openjdk.org Wed Sep 14 02:54:50 2022 From: yadongwang at openjdk.org (Yadong Wang) Date: Wed, 14 Sep 2022 02:54:50 GMT Subject: RFR: 8293695: Implement isInfinite intrinsic for RISC-V In-Reply-To: <8-TCWhiO_DkKS10Fkl7gztIBGeGJHO45F2d99YVKAvQ=.8e10e678-d320-4945-8760-ad48eefbdd88@github.com> References: <8-TCWhiO_DkKS10Fkl7gztIBGeGJHO45F2d99YVKAvQ=.8e10e678-d320-4945-8760-ad48eefbdd88@github.com> Message-ID: On Tue, 13 Sep 2022 15:48:38 GMT, Aleksei Voitylov wrote: > RISC-V 64 intrinsic for isInfinite follows the logic of x86 intrinsic (introduced by 8285868). This patch adds C2 match for IsInfinite nodes. Existing test is modified to run on RISC-V and passes on both release and fastdebug builds. Benchmark results are below: > > before: > Benchmark Mode Cnt Score Error Units > DoubleClassCheck.testIsInfiniteBranch avgt 15 43.547 ? 6.843 ns/op > DoubleClassCheck.testIsInfiniteCMov avgt 15 16.301 ? 1.386 ns/op > DoubleClassCheck.testIsInfiniteStore avgt 15 16.230 ? 1.477 ns/op > FloatClassCheck.testIsInfiniteBranch avgt 15 38.774 ? 3.572 ns/op > FloatClassCheck.testIsInfiniteCMov avgt 15 15.064 ? 1.310 ns/op > FloatClassCheck.testIsInfiniteStore avgt 15 14.967 ? 1.298 ns/op > > after: > Benchmark Mode Cnt Score Error Units > DoubleClassCheck.testIsInfiniteBranch avgt 15 39.987 ? 6.179 ns/op > DoubleClassCheck.testIsInfiniteCMov avgt 15 13.477 ? 1.159 ns/op > DoubleClassCheck.testIsInfiniteStore avgt 15 9.607 ? 0.834 ns/op > FloatClassCheck.testIsInfiniteBranch avgt 15 36.265 ? 3.168 ns/op > FloatClassCheck.testIsInfiniteCMov avgt 15 13.230 ? 1.100 ns/op > FloatClassCheck.testIsInfiniteStore avgt 15 9.492 ? 0.807 ns/op > > According to 8285868 discussion, isNaN and isFinite methods intrinsification using the same approach might be not beneficial. I'm going to investigate it for RISC-V and propose methods intrinsification as part of further work in case it's profitable. lgtm(not a reviewer) ------------- Marked as reviewed by yadongwang (Author). PR: https://git.openjdk.org/jdk/pull/10253 From jiefu at openjdk.org Wed Sep 14 04:16:00 2022 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 14 Sep 2022 04:16:00 GMT Subject: RFR: 8293774: Improve TraceOptoParse to dump the bytecode name Message-ID: <_l3DxS71zIjGyvbczKqvyBAgrf0VIQgToZW8RbdSGTQ=.df1d7cc2-0cce-41d7-a959-9267b9f80c34@github.com> Hi all, Please review this one-line patch which prints the bytecode name for `TraceOptoParse`. While I was debugging with `TraceOptoParse`, I found it only prints the bci without the bytecode name. I had to map the bci to the bytecode manually again and again. It would be better to also dump the bytecode name. Before: 568 503 4 jdk.internal.org.objectweb.asm.ByteVector::putUTF8 (144 bytes) Merging state at block #0 bci:0 with empty state on path 1 Parsing block #0 at bci [0,11), successors: 1 2 @ bci:0 @ bci:1 Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:1 Merging state at block #0 bci:0 with empty state on path 1 Parsing block #0 at bci [0,11), successors: @ bci:0 @ bci:1 @ bci:4 Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:4 @ bci:5 @ bci:6 Merging state at block #0 bci:0 with empty state on path 1 After: 571 507 4 jdk.internal.org.objectweb.asm.ByteVector::putUTF8 (144 bytes) Merging state at block #0 bci:0 with empty state on path 1 Parsing block #0 at bci [0,11), successors: 1 2 @ bci:0 aload_1 @ bci:1 invokevirtual Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:1 Merging state at block #0 bci:0 with empty state on path 1 Parsing block #0 at bci [0,11), successors: @ bci:0 aload_0 @ bci:1 getfield @ bci:4 arraylength Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:4 @ bci:5 aload_0 @ bci:6 invokevirtual Merging state at block #0 bci:0 with empty state on path 1 Testing: - tier1~3 on Linux/x64 in progress, seems fine until now Thanks. Best regards, Jie ------------- Commit messages: - 8293774: Improve TraceOptoParse to dump the bytecode name Changes: https://git.openjdk.org/jdk/pull/10262/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10262&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293774 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10262.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10262/head:pull/10262 PR: https://git.openjdk.org/jdk/pull/10262 From chagedorn at openjdk.org Wed Sep 14 05:37:52 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 14 Sep 2022 05:37:52 GMT Subject: RFR: 8293774: Improve TraceOptoParse to dump the bytecode name In-Reply-To: <_l3DxS71zIjGyvbczKqvyBAgrf0VIQgToZW8RbdSGTQ=.df1d7cc2-0cce-41d7-a959-9267b9f80c34@github.com> References: <_l3DxS71zIjGyvbczKqvyBAgrf0VIQgToZW8RbdSGTQ=.df1d7cc2-0cce-41d7-a959-9267b9f80c34@github.com> Message-ID: <-BXG_nP8-i5Uy2wIOnK6PF2Ik5rhe_dTYaX9Zk4ZsRA=.ba1e25b0-9f95-40d0-9387-8e03b15f2102@github.com> On Wed, 14 Sep 2022 04:06:58 GMT, Jie Fu wrote: > Hi all, > > Please review this one-line patch which prints the bytecode name for `TraceOptoParse`. > > While I was debugging with `TraceOptoParse`, I found it only prints the bci without the bytecode name. > I had to map the bci to the bytecode manually again and again. > It would be better to also dump the bytecode name. > > Before: > > 568 503 4 jdk.internal.org.objectweb.asm.ByteVector::putUTF8 (144 bytes) > Merging state at block #0 bci:0 with empty state on path 1 > Parsing block #0 at bci [0,11), successors: 1 2 > @ bci:0 > @ bci:1 > Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:1 > Merging state at block #0 bci:0 with empty state on path 1 > Parsing block #0 at bci [0,11), successors: > @ bci:0 > @ bci:1 > @ bci:4 > Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:4 > @ bci:5 > @ bci:6 > Merging state at block #0 bci:0 with empty state on path 1 > > > After: > > 571 507 4 jdk.internal.org.objectweb.asm.ByteVector::putUTF8 (144 bytes) > Merging state at block #0 bci:0 with empty state on path 1 > Parsing block #0 at bci [0,11), successors: 1 2 > @ bci:0 aload_1 > @ bci:1 invokevirtual > Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:1 > Merging state at block #0 bci:0 with empty state on path 1 > Parsing block #0 at bci [0,11), successors: > @ bci:0 aload_0 > @ bci:1 getfield > @ bci:4 arraylength > Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:4 > @ bci:5 aload_0 > @ bci:6 invokevirtual > Merging state at block #0 bci:0 with empty state on path 1 > > > Testing: > - tier1~3 on Linux/x64 in progress, seems fine until now > > Thanks. > Best regards, > Jie Looks good! Do we have any "hello world" kind of test that uses this flag? If not, it might be worth adding such a sanity test to cover it. But that could also be done separately. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10262 From shade at openjdk.org Wed Sep 14 05:46:41 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Sep 2022 05:46:41 GMT Subject: RFR: 8293654: Improve SharedRuntime handling of continuation helper out-arguments [v3] In-Reply-To: References: Message-ID: <_Yhl6DI5KyfagayvAbmFbQu3PPvls47MRnZc_7PHzgQ=.9850817f-0d38-409e-b01f-688a0acb0d51@github.com> On Tue, 13 Sep 2022 07:56:49 GMT, Aleksey Shipilev wrote: >> (Found this while adapting current mainline to x86_32 port) >> >> After [JDK-8292584](https://bugs.openjdk.org/browse/JDK-8292584), we have `gen_continuation_yield()` that generates compiled entry, and implicitly uses the defaults for other ones (interpreter, exception). We should be more explicit about these, and verify the generators properly initialized all out-parameters. >> >> I think we are only using interpreter/exception entry in `enterContinuation`, but not in `yield`. Notably, `exception_offset` should be `-1` for `nmethod::new_native_nmethod` to treat it as "no exception handlers". >> >> There a many ways to strengthen this, this PR is the one I like. I can do the symmetric change in aarch64, once we are agree on x86_64 version. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` >> - [x] Linux x86_64 fastdebug, `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Also handle AArch64 parts Thank you! ------------- PR: https://git.openjdk.org/jdk/pull/10241 From shade at openjdk.org Wed Sep 14 05:48:01 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Sep 2022 05:48:01 GMT Subject: Integrated: 8293654: Improve SharedRuntime handling of continuation helper out-arguments In-Reply-To: References: Message-ID: On Mon, 12 Sep 2022 16:59:45 GMT, Aleksey Shipilev wrote: > (Found this while adapting current mainline to x86_32 port) > > After [JDK-8292584](https://bugs.openjdk.org/browse/JDK-8292584), we have `gen_continuation_yield()` that generates compiled entry, and implicitly uses the defaults for other ones (interpreter, exception). We should be more explicit about these, and verify the generators properly initialized all out-parameters. > > I think we are only using interpreter/exception entry in `enterContinuation`, but not in `yield`. Notably, `exception_offset` should be `-1` for `nmethod::new_native_nmethod` to treat it as "no exception handlers". > > There a many ways to strengthen this, this PR is the one I like. I can do the symmetric change in aarch64, once we are agree on x86_64 version. > > Additional testing: > - [x] Linux x86_64 fastdebug, `hotspot_loom jdk_loom` > - [x] Linux x86_64 fastdebug, `tier1` This pull request has now been integrated. Changeset: 2baf2516 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/2baf2516e1d172268ec7c4c066a1b53bb0bf0779 Stats: 49 lines in 2 files changed: 32 ins; 12 del; 5 mod 8293654: Improve SharedRuntime handling of continuation helper out-arguments Reviewed-by: dlong, kvn ------------- PR: https://git.openjdk.org/jdk/pull/10241 From duke at openjdk.org Wed Sep 14 05:52:46 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Wed, 14 Sep 2022 05:52:46 GMT Subject: RFR: 8293618: x86: Wrong code generation in class Assembler In-Reply-To: References: Message-ID: <2x2H_y5oKWa4DfKPHmeMwWPpblnmpYh-r_h5Sw1Ouic=.f767b1b8-4aba-4422-97bf-10c4f8fddbe3@github.com> On Wed, 14 Sep 2022 02:17:17 GMT, Vladimir Kozlov wrote: >> Hi, >> >> This patch fixes some issues in the code generation of x86 assembler: >> >> - `Assembler::testl` misses `prefix(dst)` >> - `Assembler::addw` misses the 0x66 prefix >> - `Assembler::emit_operand` needs the length of the instruction from the address operand, this is often forgotten, making this parameter explicit to prevent potential issues >> - The assembler should not do optimisations that change the actual emitted instructions, these should be moved to `MacroAssembler` instead >> >> AFAICT there is no failure due to these mistakes. Please take a look and give reviews. >> Thanks you very much. > > Testing passed. @vnkozlov Thanks a lot for your testing. ------------- PR: https://git.openjdk.org/jdk/pull/10240 From jiefu at openjdk.org Wed Sep 14 07:06:44 2022 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 14 Sep 2022 07:06:44 GMT Subject: RFR: 8293774: Improve TraceOptoParse to dump the bytecode name In-Reply-To: <-BXG_nP8-i5Uy2wIOnK6PF2Ik5rhe_dTYaX9Zk4ZsRA=.ba1e25b0-9f95-40d0-9387-8e03b15f2102@github.com> References: <_l3DxS71zIjGyvbczKqvyBAgrf0VIQgToZW8RbdSGTQ=.df1d7cc2-0cce-41d7-a959-9267b9f80c34@github.com> <-BXG_nP8-i5Uy2wIOnK6PF2Ik5rhe_dTYaX9Zk4ZsRA=.ba1e25b0-9f95-40d0-9387-8e03b15f2102@github.com> Message-ID: On Wed, 14 Sep 2022 05:35:20 GMT, Christian Hagedorn wrote: > Looks good! > > Do we have any "hello world" kind of test that uses this flag? If not, it might be worth adding such a sanity test to cover it. But that could also be done separately. Thanks @chhagedorn for your review. I didn't find a test running with `TraceOptoParse`. My colleague would like to write a test for it to learn more about OpenJDK. Let's do it in a separate pr. So do you think this pr is trivial? Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/10262 From haosun at openjdk.org Wed Sep 14 07:06:47 2022 From: haosun at openjdk.org (Hao Sun) Date: Wed, 14 Sep 2022 07:06:47 GMT Subject: RFR: 8290169: adlc: Improve child constraints for vector unary operations [v3] In-Reply-To: References: Message-ID: > As demonstrated in [1], the child constrait generated for *predicated > vector unary operation* is the super set of that generated for the > *unpredicated* version. As a result, there exists a risk for predicated > vector unary operaions to match the unpredicated rules by accident. > > In this patch, we resolve this issue by generating one extra check > "rChild == NULL" ONLY for vector unary operations. In this way, the > child constraints for predicated/unpredicated vector unary operations > are exclusive now. > > Following the example in [1], the dfa state generated for AbsVI is shown > below. > > > void State::_sub_Op_AbsVI(const Node *n){ > if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && > ( UseSVE > 0 ) ) > { > unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; > DFA_PRODUCTION(VREG, vabsI_masked_rule, c) > } > if( STATE__VALID_CHILD(_kids[0], VREG) && _kids[1] == NULL && <---- 1 > ( UseSVE > 0) ) > { > unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; > if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { > DFA_PRODUCTION(VREG, vabsI_rule, c) > } > } > ... > > > We can see that the constraint at line 1 cannot be matched for > predicated AbsVI node now. > > The main updates are made in adlc/dfa part. Ideally, we should only > add the extra check for affected platforms, i.e. AVX-512 and SVE. But we > didn't do that because it would be better not to introduce any > architecture dependent implementation here. > > Besides, workarounds in both ~aarch64_sve.ad~aarch64_vector.ad and x86.ad are removed. 1) > Many "is_predicated_vector()" checks can be removed in ~aarch64_sve.ad~aarch64_vector.ad > file. 2) Default instruction cost is used for involving rules in x86.ad > file. > > ~[1]. https://github.com/shqking/jdk/commit/50ec9b19~ > [1]. https://github.com/shqking/jdk/commit/f7d9621e2 Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Remove the "is_predicated_vector()" check introduced in JDK-8292587 - Merge branch 'master' into 8290169-adlc - Merge branch 'master' into 8290169-adlc Resolve the conflicts. - 8290169: adlc: Improve child constraints for vector unary operations As demonstrated in [1], the child constrait generated for *predicated vector unary operation* is the super set of that generated for the *unpredicated* version. As a result, there exists a risk for predicated vector unary operaions to match the unpredicated rules by accident. In this patch, we resolve this issue by generating one extra check "rChild == NULL" ONLY for vector unary operations. In this way, the child constraints for predicated/unpredicated vector unary operations are exclusive now. Following the example in [1], the dfa state generated for AbsVI is shown below. ``` void State::_sub_Op_AbsVI(const Node *n){ if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && ( UseSVE > 0 ) ) { unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; DFA_PRODUCTION(VREG, vabsI_masked_rule, c) } if( STATE__VALID_CHILD(_kids[0], VREG) && _kids[1] == NULL && <---- 1 ( UseSVE > 0) ) { unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { DFA_PRODUCTION(VREG, vabsI_rule, c) } } ... ``` We can see that the constraint at line 1 cannot be matched for predicated AbsVI node now. The main updates are made in adlc/dfa part. Ideally, we should only add the extra check for affected platforms, i.e. AVX-512 and SVE. But we didn't do that because it would be better not to introduce any architecture dependent implementation here. Besides, workarounds in both aarch64_sve.ad and x86.ad are removed. 1) Many "is_predicated_vector()" checks can be removed in aarch64_sve.ad file. 2) Default instruction cost is used for involving rules in x86.ad file. [1]. https://github.com/shqking/jdk/commit/50ec9b19 ------------- Changes: https://git.openjdk.org/jdk/pull/9534/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9534&range=02 Stats: 154 lines in 5 files changed: 28 ins; 81 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/9534.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9534/head:pull/9534 PR: https://git.openjdk.org/jdk/pull/9534 From chagedorn at openjdk.org Wed Sep 14 07:12:27 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 14 Sep 2022 07:12:27 GMT Subject: RFR: 8293774: Improve TraceOptoParse to dump the bytecode name In-Reply-To: <-BXG_nP8-i5Uy2wIOnK6PF2Ik5rhe_dTYaX9Zk4ZsRA=.ba1e25b0-9f95-40d0-9387-8e03b15f2102@github.com> References: <_l3DxS71zIjGyvbczKqvyBAgrf0VIQgToZW8RbdSGTQ=.df1d7cc2-0cce-41d7-a959-9267b9f80c34@github.com> <-BXG_nP8-i5Uy2wIOnK6PF2Ik5rhe_dTYaX9Zk4ZsRA=.ba1e25b0-9f95-40d0-9387-8e03b15f2102@github.com> Message-ID: <-A4s--u2U-q4Jrl2ag7EezsRaJhPUOmywsPxgy1Kdiw=.8f33ae99-2a55-4651-807e-0747bb1e9ea9@github.com> On Wed, 14 Sep 2022 05:35:20 GMT, Christian Hagedorn wrote: >> Hi all, >> >> Please review this one-line patch which prints the bytecode name for `TraceOptoParse`. >> >> While I was debugging with `TraceOptoParse`, I found it only prints the bci without the bytecode name. >> I had to map the bci to the bytecode manually again and again. >> It would be better to also dump the bytecode name. >> >> Before: >> >> 568 503 4 jdk.internal.org.objectweb.asm.ByteVector::putUTF8 (144 bytes) >> Merging state at block #0 bci:0 with empty state on path 1 >> Parsing block #0 at bci [0,11), successors: 1 2 >> @ bci:0 >> @ bci:1 >> Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:1 >> Merging state at block #0 bci:0 with empty state on path 1 >> Parsing block #0 at bci [0,11), successors: >> @ bci:0 >> @ bci:1 >> @ bci:4 >> Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:4 >> @ bci:5 >> @ bci:6 >> Merging state at block #0 bci:0 with empty state on path 1 >> >> >> After: >> >> 571 507 4 jdk.internal.org.objectweb.asm.ByteVector::putUTF8 (144 bytes) >> Merging state at block #0 bci:0 with empty state on path 1 >> Parsing block #0 at bci [0,11), successors: 1 2 >> @ bci:0 aload_1 >> @ bci:1 invokevirtual >> Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:1 >> Merging state at block #0 bci:0 with empty state on path 1 >> Parsing block #0 at bci [0,11), successors: >> @ bci:0 aload_0 >> @ bci:1 getfield >> @ bci:4 arraylength >> Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:4 >> @ bci:5 aload_0 >> @ bci:6 invokevirtual >> Merging state at block #0 bci:0 with empty state on path 1 >> >> >> Testing: >> - tier1~3 on Linux/x64 in progress, seems fine until now >> >> Thanks. >> Best regards, >> Jie > > Looks good! > > Do we have any "hello world" kind of test that uses this flag? If not, it might be worth adding such a sanity test to cover it. But that could also be done separately. > Thanks @chhagedorn for your review. I didn't find a test running with `TraceOptoParse`. Thanks for quickly checking that. > My colleague would like to write a test for it to learn more about OpenJDK. Let's do it in a separate pr. That sounds good! > So do you think this pr is trivial? Thanks. Yes, it's trivial. ------------- PR: https://git.openjdk.org/jdk/pull/10262 From jiefu at openjdk.org Wed Sep 14 07:21:07 2022 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 14 Sep 2022 07:21:07 GMT Subject: RFR: 8293774: Improve TraceOptoParse to dump the bytecode name In-Reply-To: <-A4s--u2U-q4Jrl2ag7EezsRaJhPUOmywsPxgy1Kdiw=.8f33ae99-2a55-4651-807e-0747bb1e9ea9@github.com> References: <_l3DxS71zIjGyvbczKqvyBAgrf0VIQgToZW8RbdSGTQ=.df1d7cc2-0cce-41d7-a959-9267b9f80c34@github.com> <-BXG_nP8-i5Uy2wIOnK6PF2Ik5rhe_dTYaX9Zk4ZsRA=.ba1e25b0-9f95-40d0-9387-8e03b15f2102@github.com> <-A4s--u2U-q4Jrl2ag7EezsRaJhPUOmywsPxgy1Kdiw=.8f33ae99-2a55-4651-807e-0747bb1e9ea9@github.com> Message-ID: On Wed, 14 Sep 2022 07:10:26 GMT, Christian Hagedorn wrote: >> Looks good! >> >> Do we have any "hello world" kind of test that uses this flag? If not, it might be worth adding such a sanity test to cover it. But that could also be done separately. > >> Thanks @chhagedorn for your review. I didn't find a test running with `TraceOptoParse`. > > Thanks for quickly checking that. > >> My colleague would like to write a test for it to learn more about OpenJDK. Let's do it in a separate pr. > > That sounds good! > >> So do you think this pr is trivial? Thanks. > > Yes, it's trivial. Thanks @chhagedorn . ------------- PR: https://git.openjdk.org/jdk/pull/10262 From jiefu at openjdk.org Wed Sep 14 07:21:08 2022 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 14 Sep 2022 07:21:08 GMT Subject: Integrated: 8293774: Improve TraceOptoParse to dump the bytecode name In-Reply-To: <_l3DxS71zIjGyvbczKqvyBAgrf0VIQgToZW8RbdSGTQ=.df1d7cc2-0cce-41d7-a959-9267b9f80c34@github.com> References: <_l3DxS71zIjGyvbczKqvyBAgrf0VIQgToZW8RbdSGTQ=.df1d7cc2-0cce-41d7-a959-9267b9f80c34@github.com> Message-ID: On Wed, 14 Sep 2022 04:06:58 GMT, Jie Fu wrote: > Hi all, > > Please review this one-line patch which prints the bytecode name for `TraceOptoParse`. > > While I was debugging with `TraceOptoParse`, I found it only prints the bci without the bytecode name. > I had to map the bci to the bytecode manually again and again. > It would be better to also dump the bytecode name. > > Before: > > 568 503 4 jdk.internal.org.objectweb.asm.ByteVector::putUTF8 (144 bytes) > Merging state at block #0 bci:0 with empty state on path 1 > Parsing block #0 at bci [0,11), successors: 1 2 > @ bci:0 > @ bci:1 > Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:1 > Merging state at block #0 bci:0 with empty state on path 1 > Parsing block #0 at bci [0,11), successors: > @ bci:0 > @ bci:1 > @ bci:4 > Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:4 > @ bci:5 > @ bci:6 > Merging state at block #0 bci:0 with empty state on path 1 > > > After: > > 571 507 4 jdk.internal.org.objectweb.asm.ByteVector::putUTF8 (144 bytes) > Merging state at block #0 bci:0 with empty state on path 1 > Parsing block #0 at bci [0,11), successors: 1 2 > @ bci:0 aload_1 > @ bci:1 invokevirtual > Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:1 > Merging state at block #0 bci:0 with empty state on path 1 > Parsing block #0 at bci [0,11), successors: > @ bci:0 aload_0 > @ bci:1 getfield > @ bci:4 arraylength > Uncommon trap reason='null_check' action='maybe_recompile' debug_id='0' at bci:4 > @ bci:5 aload_0 > @ bci:6 invokevirtual > Merging state at block #0 bci:0 with empty state on path 1 > > > Testing: > - tier1~3 on Linux/x64 in progress, seems fine until now > > Thanks. > Best regards, > Jie This pull request has now been integrated. Changeset: 91f9c0d0 Author: Jie Fu URL: https://git.openjdk.org/jdk/commit/91f9c0d0cfd3d328aaec05254925d1b15611cd6e Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8293774: Improve TraceOptoParse to dump the bytecode name Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/10262 From haosun at openjdk.org Wed Sep 14 07:38:45 2022 From: haosun at openjdk.org (Hao Sun) Date: Wed, 14 Sep 2022 07:38:45 GMT Subject: RFR: 8290169: adlc: Improve child constraints for vector unary operations [v2] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 11:20:57 GMT, Tobias Hartmann wrote: >> Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge branch 'master' into 8290169-adlc >> >> Resolve the conflicts. >> - 8290169: adlc: Improve child constraints for vector unary operations >> >> As demonstrated in [1], the child constrait generated for *predicated >> vector unary operation* is the super set of that generated for the >> *unpredicated* version. As a result, there exists a risk for predicated >> vector unary operaions to match the unpredicated rules by accident. >> >> In this patch, we resolve this issue by generating one extra check >> "rChild == NULL" ONLY for vector unary operations. In this way, the >> child constraints for predicated/unpredicated vector unary operations >> are exclusive now. >> >> Following the example in [1], the dfa state generated for AbsVI is shown >> below. >> >> ``` >> void State::_sub_Op_AbsVI(const Node *n){ >> if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && >> ( UseSVE > 0 ) ) >> { >> unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; >> DFA_PRODUCTION(VREG, vabsI_masked_rule, c) >> } >> if( STATE__VALID_CHILD(_kids[0], VREG) && _kids[1] == NULL && <---- 1 >> ( UseSVE > 0) ) >> { >> unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; >> if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { >> DFA_PRODUCTION(VREG, vabsI_rule, c) >> } >> } >> ... >> ``` >> >> We can see that the constraint at line 1 cannot be matched for >> predicated AbsVI node now. >> >> The main updates are made in adlc/dfa part. Ideally, we should only >> add the extra check for affected platforms, i.e. AVX-512 and SVE. But we >> didn't do that because it would be better not to introduce any >> architecture dependent implementation here. >> >> Besides, workarounds in both aarch64_sve.ad and x86.ad are removed. 1) >> Many "is_predicated_vector()" checks can be removed in aarch64_sve.ad >> file. 2) Default instruction cost is used for involving rules in x86.ad >> file. >> >> [1]. https://github.com/shqking/jdk/commit/50ec9b19 > > Looks reasonable to me but I'm not an expert in that area. @jatin-bhateja, @sviswa7, @iwanowww could you have a look? @TobiHartmann Thanks for your comment. As some shared code, i.e. `share/adlc`, gets changed in this patch, I think it would be better for this patch to pass the Oracle CI tests. I wonder if you could help to launch the test? Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9534 From epeter at openjdk.org Wed Sep 14 08:47:06 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 Sep 2022 08:47:06 GMT Subject: RFR: 8255670: Improve C2's detection of modified nodes In-Reply-To: References: Message-ID: On Wed, 14 Sep 2022 02:42:14 GMT, Vladimir Kozlov wrote: > The only GVN transformation we do with Root node is removing TOP inputs in `RootNode::Ideal()`. That is why we are "sloppy" about putting it on worklist when we know that added input is not TOP (new Halt node in this case). @vnkozlov right, not adding Root to the worklist was not a bug in these cases. Being "sloppy" in these cases was ok. However, the question is how we want to handle it now. Because `modified_node` has Root recorded after the `add_req`, so we need to either add Root to the worklist, or take it off the `modified_node`. I discussed it with @TobiHartmann , and we think that adding it to the worklist is probably the clean solution. If any node has a modification, it is possible that IGVN needs to optimize it, or at least that there could be a future changeset that adds such an optimization. Alternatives: remove Root from `modified_node` after `add_req` (Ad-Hoc solution), or completely exclude Root from the assert. @vnkozlov What do you think? ------------- PR: https://git.openjdk.org/jdk/pull/9439 From epeter at openjdk.org Wed Sep 14 08:47:07 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 Sep 2022 08:47:07 GMT Subject: RFR: 8255670: Improve C2's detection of modified nodes In-Reply-To: References: Message-ID: <6H1T4WUt3gAk7UYl3iafEMsIBD5RrDXdGE1RhtEy3gk=.94a4cee7-ebcf-40d6-be2f-48487d3e9f81@github.com> On Tue, 13 Sep 2022 12:56:06 GMT, Roland Westrelin wrote: >> Added `record_modified_node` to: >> >> Node::clone >> Node::add_req >> Node::add_req_batch >> Node::ins_req >> Node::add_prec >> Node::rm_prec >> Node::set_prec >> >> >> Added `igvn->_worklist.push(node)` in various places that modified a `node` but did not add it to the igvn worklist. >> >> 7 times I had to push `Root`, 5 of these it was because of the creation of a `HaltNode`, which means we have a `root->add_req(halt)`. >> >> In one case we have a MergeMemStream node, which gets two MergeMem nodes as input, and streams over them. >> Unfortunately, it modifies one of the two, which then can trigger our assertion code. I now push this node to the igvn worklist, but a better fix would be to make MergeMemStream leave the MergeMem nodes unmodified. I think that should be possible, filed an RFE [JDK-8293358](https://bugs.openjdk.org/browse/JDK-8293358) >> >> FYI: >> What I am NOT doing here, and leave to a future RFE/independent change: investigate / implement these assertions for late/incremental inlining. >> >> Ran larger regression tests, and 7-9h of fuzzing on 3 platforms. > > src/hotspot/share/opto/loopnode.cpp line 5135: > >> 5133: set_loop(halt, l); >> 5134: C->root()->add_req(halt); >> 5135: _igvn._worklist.push(C->root()); > > Maybe add a method to PhaseIterGVN that does the add_req + push similar to replace_input_of? @rwestrel Thanks for the suggestion, I will do that ------------- PR: https://git.openjdk.org/jdk/pull/9439 From aph at openjdk.org Wed Sep 14 08:58:44 2022 From: aph at openjdk.org (Andrew Haley) Date: Wed, 14 Sep 2022 08:58:44 GMT Subject: RFR: 8262901: [macos_aarch64] NativeCallTest expected:<-3.8194101E18> but was:<3.02668882E10> In-Reply-To: References: Message-ID: On Tue, 13 Sep 2022 10:28:43 GMT, Olga Mikhaltsova wrote: >> src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot.aarch64/src/jdk/vm/ci/hotspot/aarch64/AArch64HotSpotRegisterConfig.java line 291: >> >>> 289: currentStackOffset += Math.max(valueKind.getPlatformKind().getSizeInBytes(), target.wordSize); >>> 290: } >>> 291: } >> >> So I'm curious: why not subclass `AArch64HotSpotRegisterConfig` here, or maybe even use an interface, rather than the boolean? > > I tried to be closer to the original review https://github.com/openjdk/jdk/pull/6641 that requires only 2 fixes and tried to do only this in order to continue easily. > > Could you clarify please what boolean you talk about? `private final boolean macOS;` that was pushed into `class AArch64HotSpotRegisterConfig`, right? I'm hesitating a bit because of the highlighted code. Yes, that `macOS` boolean. Maybe it's not worth the effort, but it seems to me as though the use of the boolean in several places is something of a code smell, and this patch makes it more so. The control flow is not easy to follow I am wondering if refactoring it so that the code between L269 and L291 were broken out into two methods, one fot MacOS and one for the others. I might be wrong, but I'd try it. ------------- PR: https://git.openjdk.org/jdk/pull/10238 From roland at openjdk.org Wed Sep 14 09:06:41 2022 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 14 Sep 2022 09:06:41 GMT Subject: RFR: 8292301: [REDO v2] C2 crash when allocating array of size too large [v4] In-Reply-To: References: Message-ID: On Thu, 1 Sep 2022 13:25:38 GMT, Roland Westrelin wrote: > LGTM. Thanks for re-reviewing. @TobiHartmann recommended one more review as this change caused issues and had to be backed out. ------------- PR: https://git.openjdk.org/jdk/pull/10038 From duke at openjdk.org Wed Sep 14 09:39:07 2022 From: duke at openjdk.org (Sacha Coppey) Date: Wed, 14 Sep 2022 09:39:07 GMT Subject: RFR: 8290154: [JVMCI] partially implement JVMCI for RISC-V [v10] In-Reply-To: References: Message-ID: <01N2Slfoz83bKVvbH3Ja0O0cOI-rcagrV6jeIdi3dws=.4cce1f7e-2223-4013-bb11-8319aef46444@github.com> > This patch adds a partial JVMCI implementation for RISC-V, to allow using the GraalVM Native Image RISC-V LLVM backend, which does not use JVMCI for code emission. > It creates the jdk.vm.ci.riscv64 and jdk.vm.ci.hotspot.riscv64 packages, as well as implements a part of jvmciCodeInstaller_riscv64.cpp. To check for correctness, it enables JVMCI code installation tests on RISC-V. More testing is performed in Native Image. Sacha Coppey has updated the pull request incrementally with one additional commit since the last revision: Remove noinline attribute by fixing sign extended value ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9587/files - new: https://git.openjdk.org/jdk/pull/9587/files/976606fe..bfb1ca0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9587&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9587&range=08-09 Stats: 5 lines in 2 files changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/9587.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9587/head:pull/9587 PR: https://git.openjdk.org/jdk/pull/9587 From omikhaltcova at openjdk.org Wed Sep 14 09:43:31 2022 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Wed, 14 Sep 2022 09:43:31 GMT Subject: RFR: 8262901: [macos_aarch64] NativeCallTest expected:<-3.8194101E18> but was:<3.02668882E10> In-Reply-To: References: Message-ID: <0w6g_W_O_fkL77CKFbIgedYdNAOX0fhnuR7fq0a3v9g=.6ee6661e-72bd-47f3-addb-b6e6eb8d8b63@github.com> On Wed, 14 Sep 2022 08:56:24 GMT, Andrew Haley wrote: >> I tried to be closer to the original review https://github.com/openjdk/jdk/pull/6641 that requires only 2 fixes and tried to do only this in order to continue easily. >> >> Could you clarify please what boolean you talk about? `private final boolean macOS;` that was pushed into `class AArch64HotSpotRegisterConfig`, right? I'm hesitating a bit because of the highlighted code. > > Yes, that `macOS` boolean. > > Maybe it's not worth the effort, but it seems to me as though the use of the boolean in several places is something of a code smell, and this patch makes it more so. The control flow is not easy to follow. > I am wondering if refactoring it so that the code between L269 and L291 were broken out into two methods, one for MacOS and one for the others. I might be wrong, but I'd try it. Thanks for the tip! Absolutely agree, it's worth doing refactoring here. I'll try to follow this way. ------------- PR: https://git.openjdk.org/jdk/pull/10238 From rcastanedalo at openjdk.org Wed Sep 14 11:47:37 2022 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Sep 2022 11:47:37 GMT Subject: RFR: JDK-8293480: IGV: Update Bytecode and ControlFlow Component immediately when opening a new graph [v4] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 13:10:59 GMT, Tobias Holenstein wrote: >> The `BytecodeViewTopComponent` and `ControlFlowTopComponent` represent information depending on what graph is open in `EditorTopComponent`. Previously, `BytecodeViewTopComponent` and `ControlFlowTopComponent` did not update its content immediately when a new graph from a different group is opened in `EditorTopComponent`. They also did not update when switching between two tabs of open graph. >> >> We missed to `fire()` a `diagramChangedEvent` in the constructor of `EditorTopComponent`. We also need to fire when `BytecodeViewTopComponent` and `ControlFlowTopComponent` are initially opened. >> Update > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update Bytecode and ControlFlow when a group is removed Thanks for addressing the additional case, Tobias. I found an issue in the new revision, though: the graph selected in the Outline is not updated when clicking on the "Show previous / next graph of the current group" buttons. Furthermore, after clicking a few times forward and backwards (~20 times or more), IGV becomes very unresponsive (tens of seconds to update the graph view). ------------- Changes requested by rcastanedalo (Reviewer). PR: https://git.openjdk.org/jdk/pull/10196 From thartmann at openjdk.org Wed Sep 14 11:52:43 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 14 Sep 2022 11:52:43 GMT Subject: RFR: 8290169: adlc: Improve child constraints for vector unary operations [v3] In-Reply-To: References: Message-ID: On Wed, 14 Sep 2022 07:06:47 GMT, Hao Sun wrote: >> As demonstrated in [1], the child constrait generated for *predicated >> vector unary operation* is the super set of that generated for the >> *unpredicated* version. As a result, there exists a risk for predicated >> vector unary operaions to match the unpredicated rules by accident. >> >> In this patch, we resolve this issue by generating one extra check >> "rChild == NULL" ONLY for vector unary operations. In this way, the >> child constraints for predicated/unpredicated vector unary operations >> are exclusive now. >> >> Following the example in [1], the dfa state generated for AbsVI is shown >> below. >> >> >> void State::_sub_Op_AbsVI(const Node *n){ >> if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && >> ( UseSVE > 0 ) ) >> { >> unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; >> DFA_PRODUCTION(VREG, vabsI_masked_rule, c) >> } >> if( STATE__VALID_CHILD(_kids[0], VREG) && _kids[1] == NULL && <---- 1 >> ( UseSVE > 0) ) >> { >> unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; >> if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { >> DFA_PRODUCTION(VREG, vabsI_rule, c) >> } >> } >> ... >> >> >> We can see that the constraint at line 1 cannot be matched for >> predicated AbsVI node now. >> >> The main updates are made in adlc/dfa part. Ideally, we should only >> add the extra check for affected platforms, i.e. AVX-512 and SVE. But we >> didn't do that because it would be better not to introduce any >> architecture dependent implementation here. >> >> Besides, workarounds in both ~aarch64_sve.ad~aarch64_vector.ad and x86.ad are removed. 1) >> Many "is_predicated_vector()" checks can be removed in ~aarch64_sve.ad~aarch64_vector.ad >> file. 2) Default instruction cost is used for involving rules in x86.ad >> file. >> >> ~[1]. https://github.com/shqking/jdk/commit/50ec9b19~ >> [1]. https://github.com/shqking/jdk/commit/f7d9621e2 > > Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Remove the "is_predicated_vector()" check introduced in JDK-8292587 > - Merge branch 'master' into 8290169-adlc > - Merge branch 'master' into 8290169-adlc > > Resolve the conflicts. > - 8290169: adlc: Improve child constraints for vector unary operations > > As demonstrated in [1], the child constrait generated for *predicated > vector unary operation* is the super set of that generated for the > *unpredicated* version. As a result, there exists a risk for predicated > vector unary operaions to match the unpredicated rules by accident. > > In this patch, we resolve this issue by generating one extra check > "rChild == NULL" ONLY for vector unary operations. In this way, the > child constraints for predicated/unpredicated vector unary operations > are exclusive now. > > Following the example in [1], the dfa state generated for AbsVI is shown > below. > > ``` > void State::_sub_Op_AbsVI(const Node *n){ > if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && > ( UseSVE > 0 ) ) > { > unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; > DFA_PRODUCTION(VREG, vabsI_masked_rule, c) > } > if( STATE__VALID_CHILD(_kids[0], VREG) && _kids[1] == NULL && <---- 1 > ( UseSVE > 0) ) > { > unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; > if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { > DFA_PRODUCTION(VREG, vabsI_rule, c) > } > } > ... > ``` > > We can see that the constraint at line 1 cannot be matched for > predicated AbsVI node now. > > The main updates are made in adlc/dfa part. Ideally, we should only > add the extra check for affected platforms, i.e. AVX-512 and SVE. But we > didn't do that because it would be better not to introduce any > architecture dependent implementation here. > > Besides, workarounds in both aarch64_sve.ad and x86.ad are removed. 1) > Many "is_predicated_vector()" checks can be removed in aarch64_sve.ad > file. 2) Default instruction cost is used for involving rules in x86.ad > file. > > [1]. https://github.com/shqking/jdk/commit/50ec9b19 Sure, I did already run testing. All passed. ------------- PR: https://git.openjdk.org/jdk/pull/9534 From thartmann at openjdk.org Wed Sep 14 12:05:30 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 14 Sep 2022 12:05:30 GMT Subject: RFR: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv [v4] In-Reply-To: <63VQjfbVlP0EB_CjYS1LZahu0cpiS4Zjp7-Gbg49LDw=.316e4cba-4fd7-4061-8e9c-9d81e78d73bb@github.com> References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> <63VQjfbVlP0EB_CjYS1LZahu0cpiS4Zjp7-Gbg49LDw=.316e4cba-4fd7-4061-8e9c-9d81e78d73bb@github.com> Message-ID: On Tue, 13 Sep 2022 14:31:55 GMT, Pengfei Li wrote: >> This is a REDO of JDK-8289996. In previous patch, we defer some strength >> reductions in Ideal functions of `Mul[I|L]Node` to post loop igvn phase >> to fix a range check hoisting issue. More about previous patch can be >> found in PR #9508, where we have described some details of the issue >> we would like to fix. >> >> Previous patch was backed out due to some jtreg failures found. We have >> analyzed those failures one by one and found one of them exposes a real >> performance regression. We see that deferring some strength reductions >> to post loop igvn phase has too much impact. Some vector multiplication >> will not be optimized to vector addition with vector shift after that >> change. So in this REDO we propose the range check hoisting fix with a >> different approach. >> >> In this new patch, we add some recursive pattern matches for scaled loop >> iv in function `PhaseIdealLoop::is_scaled_iv()`. These include matching >> a sum or a difference of two scaled iv expressions. With this, all kinds >> of Ideal-transformed scaled iv expressions can still be recognized. This >> new approach only touches loop transformation code and hence has much >> smaller impact. We have verified that this new approach applies to both >> int range checks and long range checks. >> >> Previously attached jtreg case fails on ppc64 because VectorAPI has no >> vector intrinsics on ppc64 so there's no long range check to hoist. In >> this patch, we limit the test architecture to x64 and AArch64. >> >> Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Bail out when scale value is min_jint Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.org/jdk/pull/9851 From thartmann at openjdk.org Wed Sep 14 12:06:48 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 14 Sep 2022 12:06:48 GMT Subject: RFR: 8289422: Fix and re-enable vector conditional move [v4] In-Reply-To: References: <6uthI29shZjAeLK-eV3Kxqao06qoa9U9zQ5g_oDLmkI=.3e171aae-2003-46c9-88ac-9a63fecc5d96@github.com> Message-ID: <5S-UTbWsM5a0vpMIwa93xNi9p-1DGIM0bXfBT1UxtPM=.b45231dd-8338-4c1a-a350-b48365186c5f@github.com> On Tue, 6 Sep 2022 02:47:38 GMT, Fei Gao wrote: >> // float[] a, float[] b, float[] c; >> for (int i = 0; i < a.length; i++) { >> c[i] = (a[i] > b[i]) ? a[i] : b[i]; >> } >> >> >> After [JDK-8139340](https://bugs.openjdk.org/browse/JDK-8139340) and [JDK-8192846](https://bugs.openjdk.org/browse/JDK-8192846), we hope to vectorize the case >> above by enabling -XX:+UseCMoveUnconditionally and -XX:+UseVectorCmov. >> But the transformation here[1] is going to optimize the BoolNode >> with constant input to a constant and break the design logic of >> cmove vector node[2]. We can't prevent all GVN transformation to >> the BoolNode before matcher, so the patch keeps the condition input >> as a constant while creating a cmove vector node, and then >> restructures it into a binary tree before matching. >> >> When the input order of original cmp node is different from the >> input order of original cmove node, like: >> >> // float[] a, float[] b, float[] c; >> for (int i = 0; i < a.length; i++) { >> c[i] = (a[i] < b[i]) ? a[i] : b[i]; >> } >> >> the patch negates the mask of the BoolNode before creating the >> cmove vector node in SuperWord::output(). >> >> We can also use VectorNode::implemented() to consult if vector >> conditional move is supported in the backend. So, the patch cleans >> the related code in SuperWord::implemented(). >> >> With the patch, the performance uplift is: >> (The micro-benchmark functions are included in the file >> test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java) >> >> AArch64: >> Benchmark (length) Mode Cnt uplift(ns/op) >> cmoveD 523 avgt 15 68.89% >> cmoveF 523 avgt 15 72.40% >> >> X86: >> Benchmark (length) Mode Cnt uplift(ns/op) >> cmoveD 523 avgt 15 73.12% >> cmoveF 523 avgt 15 85.45% >> >> [1]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/subnode.cpp#L1310 >> [2]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/matcher.cpp#L2365 > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Rebase the patch to the latest JDK and add some testcase for NE and EQ > > Change-Id: Ifb02b5efc2a09e6e0b4fc1c8346698597464f448 > - Merge branch 'master' into fg8289422 > > Change-Id: I09677cb07f6b2717aa768a830663ca455806b900 > - Merge branch 'master' into fg8289422 > > Change-Id: I870c7bbc73d12bac16756226125edc1a229ba412 > - Enable the test only on aarch64 platform because X86 supports vector cmove only on some 256-bits AVXs > > Change-Id: I64dd49380fe3d303ef6be21460df3be31c1458f8 > - Merge branch 'master' into fg8289422 > > Change-Id: I7936552df6ac12949ed8b550576f4e3520596423 > - 8289422: Fix and re-enable vector conditional move > > ``` > // float[] a, float[] b, float[] c; > for (int i = 0; i < a.length; i++) { > c[i] = (a[i] > b[i]) ? a[i] : b[i]; > } > ``` > > After JDK-8139340 and JDK-8192846, we hope to vectorize the case > above by enabling -XX:+UseCMoveUnconditionally and -XX:+UseVectorCmov. > But the transformation here[1] is going to optimize the BoolNode > with constant input to a constant and break the design logic of > cmove vector node[2]. We can't prevent all GVN transformation to > the BoolNode before matcher, so the patch keeps the condition input > as a constant while creating a cmove vector node, and then > restructures it into a binary tree before matching. > > When the input order of original cmp node is different from the > input order of original cmove node, like: > ``` > // float[] a, float[] b, float[] c; > for (int i = 0; i < a.length; i++) { > c[i] = (a[i] < b[i]) ? a[i] : b[i]; > } > ``` > the patch negates the mask of the BoolNode before creating the > cmove vector node in SuperWord::output(). > > We can also use VectorNode::implemented() to consult if vector > conditional move is supported in the backend. So, the patch cleans > the related code in SuperWord::implemented(). > > With the patch, the performance uplift is: > (The micro-benchmark functions are included in the file > test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java) > > AArch64: > Benchmark (length) Mode Cnt uplift(ns/op) > cmoveD 523 avgt 15 68.89% > cmoveF 523 avgt 15 72.40% > > X86: > Benchmark (length) Mode Cnt uplift(ns/op) > cmoveD 523 avgt 15 73.12% > cmoveF 523 avgt 15 85.45% > > [1]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/subnode.cpp#L1310 > [2]https://github.com/openjdk/jdk/blob/779b4e1d1959bc15a27492b7e2b951678e39cca8/src/hotspot/share/opto/matcher.cpp#L2365 > > Change-Id: If046dd745024deb0e602bf7efc2a07c22b89c690 Okay, please go ahead and file a follow-up bug then. ------------- PR: https://git.openjdk.org/jdk/pull/9652 From thartmann at openjdk.org Wed Sep 14 12:25:37 2022 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 14 Sep 2022 12:25:37 GMT Subject: RFR: 8292761: x86: Clone nodes to match complex rules [v3] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 11:18:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch tries to clone a node if it can be matched as a part of a BMI and lea pattern. This may reduce the live range of a local or remove that local completely. >> >> Please take a look and have some reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - add benchmark > - Merge branch 'master' into cloneSimpleNodes > - Merge branch 'master' into cloneSimpleNodes > - fix > - Merge branch 'master' into cloneSimpleNodes > - shorten > - improve checks > - lea patterns > - refactor > - lea patterns > - ... and 1 more: https://git.openjdk.org/jdk/compare/3e8a2f56...0beae979 Thanks. This looks reasonable to me but please add comments explaining the individual patterns with an example. Someone more familiar with this code (@jatin-bhateja, @sviswa7 ?), should also have a look. ------------- PR: https://git.openjdk.org/jdk/pull/9977 From pli at openjdk.org Wed Sep 14 14:20:44 2022 From: pli at openjdk.org (Pengfei Li) Date: Wed, 14 Sep 2022 14:20:44 GMT Subject: RFR: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv [v4] In-Reply-To: <63VQjfbVlP0EB_CjYS1LZahu0cpiS4Zjp7-Gbg49LDw=.316e4cba-4fd7-4061-8e9c-9d81e78d73bb@github.com> References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> <63VQjfbVlP0EB_CjYS1LZahu0cpiS4Zjp7-Gbg49LDw=.316e4cba-4fd7-4061-8e9c-9d81e78d73bb@github.com> Message-ID: On Tue, 13 Sep 2022 14:31:55 GMT, Pengfei Li wrote: >> This is a REDO of JDK-8289996. In previous patch, we defer some strength >> reductions in Ideal functions of `Mul[I|L]Node` to post loop igvn phase >> to fix a range check hoisting issue. More about previous patch can be >> found in PR #9508, where we have described some details of the issue >> we would like to fix. >> >> Previous patch was backed out due to some jtreg failures found. We have >> analyzed those failures one by one and found one of them exposes a real >> performance regression. We see that deferring some strength reductions >> to post loop igvn phase has too much impact. Some vector multiplication >> will not be optimized to vector addition with vector shift after that >> change. So in this REDO we propose the range check hoisting fix with a >> different approach. >> >> In this new patch, we add some recursive pattern matches for scaled loop >> iv in function `PhaseIdealLoop::is_scaled_iv()`. These include matching >> a sum or a difference of two scaled iv expressions. With this, all kinds >> of Ideal-transformed scaled iv expressions can still be recognized. This >> new approach only touches loop transformation code and hence has much >> smaller impact. We have verified that this new approach applies to both >> int range checks and long range checks. >> >> Previously attached jtreg case fails on ppc64 because VectorAPI has no >> vector intrinsics on ppc64 so there's no long range check to hoist. In >> this patch, we limit the test architecture to x64 and AArch64. >> >> Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Bail out when scale value is min_jint Thanks for review. I will integrate this. ------------- PR: https://git.openjdk.org/jdk/pull/9851 From pli at openjdk.org Wed Sep 14 14:24:52 2022 From: pli at openjdk.org (Pengfei Li) Date: Wed, 14 Sep 2022 14:24:52 GMT Subject: Integrated: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv In-Reply-To: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> References: <5XU7GsP99-GVCxCJi7bVvvKbW_YG3XQGuIVm-LclQOw=.9b48b73d-4172-4f8e-a82d-03bad545c2fc@github.com> Message-ID: On Fri, 12 Aug 2022 09:42:25 GMT, Pengfei Li wrote: > This is a REDO of JDK-8289996. In previous patch, we defer some strength > reductions in Ideal functions of `Mul[I|L]Node` to post loop igvn phase > to fix a range check hoisting issue. More about previous patch can be > found in PR #9508, where we have described some details of the issue > we would like to fix. > > Previous patch was backed out due to some jtreg failures found. We have > analyzed those failures one by one and found one of them exposes a real > performance regression. We see that deferring some strength reductions > to post loop igvn phase has too much impact. Some vector multiplication > will not be optimized to vector addition with vector shift after that > change. So in this REDO we propose the range check hoisting fix with a > different approach. > > In this new patch, we add some recursive pattern matches for scaled loop > iv in function `PhaseIdealLoop::is_scaled_iv()`. These include matching > a sum or a difference of two scaled iv expressions. With this, all kinds > of Ideal-transformed scaled iv expressions can still be recognized. This > new approach only touches loop transformation code and hence has much > smaller impact. We have verified that this new approach applies to both > int range checks and long range checks. > > Previously attached jtreg case fails on ppc64 because VectorAPI has no > vector intrinsics on ppc64 so there's no long range check to hoist. In > this patch, we limit the test architecture to x64 and AArch64. > > Tested hotspot::hotspot_all_no_apps, jdk::tier1~3 and langtools::tier1. This pull request has now been integrated. Changeset: 211fab8d Author: Pengfei Li URL: https://git.openjdk.org/jdk/commit/211fab8d361822bbd1a34a88626853bf4a029af5 Stats: 282 lines in 4 files changed: 270 ins; 3 del; 9 mod 8291669: [REDO] Fix array range check hoisting for some scaled loop iv Reviewed-by: roland, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/9851 From sviswanathan at openjdk.org Wed Sep 14 16:35:53 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 14 Sep 2022 16:35:53 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v2] In-Reply-To: References: Message-ID: On Wed, 14 Sep 2022 02:48:57 GMT, Quan Anh Mai wrote: >> I don't think we should replace the inc/dec by add. >> >> On my desktop, I see the following: >> Before: >> Benchmark Mode Cnt Score Error Units >> BasicRules.add_mem_con avgt 3 132.268 ? 0.599 ns/op >> BasicRules.inc_mem avgt 3 169.980 ? 0.617 ns/op >> >> After: >> Benchmark Mode Cnt Score Error Units >> BasicRules.add_mem_con avgt 3 117.426 ? 0.128 ns/op >> BasicRules.inc_mem avgt 3 182.907 ? 0.277 ns/op >> >> The inc_mem jmh performance is worse after the patch. >> >> There is already UseIncDec option which is set appropriately to select whether to generate inc/dec or the add/sub instruction. > > @sviswa7 Thanks a lot for your review, I have reverted that change. I don't understand why, though, it does not seem that the bottleneck is in the predecoder. @merykitty Thanks for reverting those changes. Could you please also add jmh tests for the following: 1) AndL with 255 2) AndL with 65535 3) DivL by 10 For 1) and 2) we are changing the instruction from q version to l version, so want to make sure the performance is at par atleast. For 3) it will be good to check that the compiler is optimizing divide by 10 for long data type as well now. ------------- PR: https://git.openjdk.org/jdk/pull/9791 From kvn at openjdk.org Wed Sep 14 16:39:46 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 14 Sep 2022 16:39:46 GMT Subject: RFR: 8255670: Improve C2's detection of modified nodes In-Reply-To: References: Message-ID: On Wed, 14 Sep 2022 08:44:06 GMT, Emanuel Peter wrote: > > The only GVN transformation we do with Root node is removing TOP inputs in `RootNode::Ideal()`. That is why we are "sloppy" about putting it on worklist when we know that added input is not TOP (new Halt node in this case). > > @vnkozlov right, not adding Root to the worklist was not a bug in these cases. Being "sloppy" in these cases was ok. However, the question is how we want to handle it now. Because `modified_node` has Root recorded after the `add_req`, so we need to either add Root to the worklist, or take it off the `modified_node`. I discussed it with @TobiHartmann , and we think that adding it to the worklist is probably the clean solution. If any node has a modification, it is possible that IGVN needs to optimize it, or at least that there could be a future changeset that adds such an optimization. Alternatives: remove Root from `modified_node` after `add_req` (Ad-Hoc solution), or completely exclude Root from the assert. @vnkozlov What do you think? Okay, it make sense. I agree with changes. ------------- PR: https://git.openjdk.org/jdk/pull/9439 From kvn at openjdk.org Wed Sep 14 16:39:48 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 14 Sep 2022 16:39:48 GMT Subject: RFR: 8255670: Improve C2's detection of modified nodes In-Reply-To: <6H1T4WUt3gAk7UYl3iafEMsIBD5RrDXdGE1RhtEy3gk=.94a4cee7-ebcf-40d6-be2f-48487d3e9f81@github.com> References: <6H1T4WUt3gAk7UYl3iafEMsIBD5RrDXdGE1RhtEy3gk=.94a4cee7-ebcf-40d6-be2f-48487d3e9f81@github.com> Message-ID: On Wed, 14 Sep 2022 08:44:38 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopnode.cpp line 5135: >> >>> 5133: set_loop(halt, l); >>> 5134: C->root()->add_req(halt); >>> 5135: _igvn._worklist.push(C->root()); >> >> Maybe add a method to PhaseIterGVN that does the add_req + push similar to replace_input_of? > > @rwestrel Thanks for the suggestion, I will do that Yes, it is good suggestion. ------------- PR: https://git.openjdk.org/jdk/pull/9439 From sviswanathan at openjdk.org Wed Sep 14 16:39:54 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 14 Sep 2022 16:39:54 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v3] In-Reply-To: References: Message-ID: On Wed, 14 Sep 2022 02:48:40 GMT, Quan Anh Mai wrote: >> The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. >> >> Testing: Manually run the test case in the JBS and look at the compiled code. >> >> I also do some small clean-ups in x86_64.ad: >> >> - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. >> - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. >> - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed >> - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. >> - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. >> >> Please kindly review, thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > revert removing inc_mem src/hotspot/cpu/x86/x86_64.ad line 10209: > 10207: format %{ "movzbl $dst, $src\t# long & 0xFF" %} > 10208: ins_encode %{ > 10209: __ movzbl($dst$$Register, $src$$Register); Good to add a comment here that the upper 32 bits are zeroed by the instruction. src/hotspot/cpu/x86/x86_64.ad line 10221: > 10219: format %{ "movzwq $dst, $src\t# long & 0xFFFF" %} > 10220: ins_encode %{ > 10221: __ movzwl($dst$$Register, $src$$Register); Good to add a comment here that the upper 32 bits are zeroed by the instruction. ------------- PR: https://git.openjdk.org/jdk/pull/9791 From svkamath at openjdk.org Wed Sep 14 18:27:49 2022 From: svkamath at openjdk.org (Smita Kamath) Date: Wed, 14 Sep 2022 18:27:49 GMT Subject: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v6] In-Reply-To: References: <0im5_Of4jj3BHOipBt4VkAuX3327nQbKtW_sYnzcVUU=.0153ab0a-aa0c-4c12-a150-407a85cf8d0c@github.com> Message-ID: On Thu, 1 Sep 2022 21:17:30 GMT, Vladimir Kozlov wrote: >> Yes; I removed support for --release 7 in JDK 20 early today. >> >> >> On Thu, Sep 1, 2022 at 1:36 PM Paul Sandoz ***@***.***> wrote: >> >>> Likely requires a merge with master. >>> >>> ? >>> Reply to this email directly, view it on GitHub >>> , or >>> unsubscribe >>> >>> . >>> You are receiving this because you were mentioned.Message ID: >>> ***@***.***> >>> > >> Yes; I removed support for --release 7 in JDK 20 early today. > > Yes, I missed that my merge of master did not apply because I modified tests, Joe fixed, to test his changes. > > Resubmitting testing after resolving the issue. Still missed parentheses issue should be fixed. @vnkozlov Hi Vladimir, does the patch look good to you? Please do let me know. Thank you. ------------- PR: https://git.openjdk.org/jdk/pull/9781 From fyang at openjdk.org Wed Sep 14 23:58:43 2022 From: fyang at openjdk.org (Fei Yang) Date: Wed, 14 Sep 2022 23:58:43 GMT Subject: RFR: 8293695: Implement isInfinite intrinsic for RISC-V In-Reply-To: <8-TCWhiO_DkKS10Fkl7gztIBGeGJHO45F2d99YVKAvQ=.8e10e678-d320-4945-8760-ad48eefbdd88@github.com> References: <8-TCWhiO_DkKS10Fkl7gztIBGeGJHO45F2d99YVKAvQ=.8e10e678-d320-4945-8760-ad48eefbdd88@github.com> Message-ID: On Tue, 13 Sep 2022 15:48:38 GMT, Aleksei Voitylov wrote: > RISC-V 64 intrinsic for isInfinite follows the logic of x86 intrinsic (introduced by 8285868). This patch adds C2 match for IsInfinite nodes. Existing test is modified to run on RISC-V and passes on both release and fastdebug builds. Benchmark results are below: > > before: > Benchmark Mode Cnt Score Error Units > DoubleClassCheck.testIsInfiniteBranch avgt 15 43.547 ? 6.843 ns/op > DoubleClassCheck.testIsInfiniteCMov avgt 15 16.301 ? 1.386 ns/op > DoubleClassCheck.testIsInfiniteStore avgt 15 16.230 ? 1.477 ns/op > FloatClassCheck.testIsInfiniteBranch avgt 15 38.774 ? 3.572 ns/op > FloatClassCheck.testIsInfiniteCMov avgt 15 15.064 ? 1.310 ns/op > FloatClassCheck.testIsInfiniteStore avgt 15 14.967 ? 1.298 ns/op > > after: > Benchmark Mode Cnt Score Error Units > DoubleClassCheck.testIsInfiniteBranch avgt 15 39.987 ? 6.179 ns/op > DoubleClassCheck.testIsInfiniteCMov avgt 15 13.477 ? 1.159 ns/op > DoubleClassCheck.testIsInfiniteStore avgt 15 9.607 ? 0.834 ns/op > FloatClassCheck.testIsInfiniteBranch avgt 15 36.265 ? 3.168 ns/op > FloatClassCheck.testIsInfiniteCMov avgt 15 13.230 ? 1.100 ns/op > FloatClassCheck.testIsInfiniteStore avgt 15 9.492 ? 0.807 ns/op > > According to 8285868 discussion, isNaN and isFinite methods intrinsification using the same approach might be not beneficial. I'm going to investigate it for RISC-V and propose methods intrinsification as part of further work in case it's profitable. Looks OK. Also passed Tier1 test on my SiFive Unmatched board. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10253 From haosun at openjdk.org Thu Sep 15 00:42:56 2022 From: haosun at openjdk.org (Hao Sun) Date: Thu, 15 Sep 2022 00:42:56 GMT Subject: RFR: 8290169: adlc: Improve child constraints for vector unary operations [v3] In-Reply-To: References: Message-ID: On Wed, 14 Sep 2022 11:49:15 GMT, Tobias Hartmann wrote: >> Hao Sun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Remove the "is_predicated_vector()" check introduced in JDK-8292587 >> - Merge branch 'master' into 8290169-adlc >> - Merge branch 'master' into 8290169-adlc >> >> Resolve the conflicts. >> - 8290169: adlc: Improve child constraints for vector unary operations >> >> As demonstrated in [1], the child constrait generated for *predicated >> vector unary operation* is the super set of that generated for the >> *unpredicated* version. As a result, there exists a risk for predicated >> vector unary operaions to match the unpredicated rules by accident. >> >> In this patch, we resolve this issue by generating one extra check >> "rChild == NULL" ONLY for vector unary operations. In this way, the >> child constraints for predicated/unpredicated vector unary operations >> are exclusive now. >> >> Following the example in [1], the dfa state generated for AbsVI is shown >> below. >> >> ``` >> void State::_sub_Op_AbsVI(const Node *n){ >> if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && >> ( UseSVE > 0 ) ) >> { >> unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; >> DFA_PRODUCTION(VREG, vabsI_masked_rule, c) >> } >> if( STATE__VALID_CHILD(_kids[0], VREG) && _kids[1] == NULL && <---- 1 >> ( UseSVE > 0) ) >> { >> unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; >> if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { >> DFA_PRODUCTION(VREG, vabsI_rule, c) >> } >> } >> ... >> ``` >> >> We can see that the constraint at line 1 cannot be matched for >> predicated AbsVI node now. >> >> The main updates are made in adlc/dfa part. Ideally, we should only >> add the extra check for affected platforms, i.e. AVX-512 and SVE. But we >> didn't do that because it would be better not to introduce any >> architecture dependent implementation here. >> >> Besides, workarounds in both aarch64_sve.ad and x86.ad are removed. 1) >> Many "is_predicated_vector()" checks can be removed in aarch64_sve.ad >> file. 2) Default instruction cost is used for involving rules in x86.ad >> file. >> >> [1]. https://github.com/shqking/jdk/commit/50ec9b19 > > Sure, I did already run testing. All passed. @TobiHartmann Thanks for your testing. ------------- PR: https://git.openjdk.org/jdk/pull/9534 From haosun at openjdk.org Thu Sep 15 01:41:17 2022 From: haosun at openjdk.org (Hao Sun) Date: Thu, 15 Sep 2022 01:41:17 GMT Subject: Integrated: 8290169: adlc: Improve child constraints for vector unary operations In-Reply-To: References: Message-ID: On Mon, 18 Jul 2022 07:46:17 GMT, Hao Sun wrote: > As demonstrated in [1], the child constrait generated for *predicated > vector unary operation* is the super set of that generated for the > *unpredicated* version. As a result, there exists a risk for predicated > vector unary operaions to match the unpredicated rules by accident. > > In this patch, we resolve this issue by generating one extra check > "rChild == NULL" ONLY for vector unary operations. In this way, the > child constraints for predicated/unpredicated vector unary operations > are exclusive now. > > Following the example in [1], the dfa state generated for AbsVI is shown > below. > > > void State::_sub_Op_AbsVI(const Node *n){ > if( STATE__VALID_CHILD(_kids[0], VREG) && STATE__VALID_CHILD(_kids[1], PREGGOV) && > ( UseSVE > 0 ) ) > { > unsigned int c = _kids[0]->_cost[VREG]+_kids[1]->_cost[PREGGOV] + SVE_COST; > DFA_PRODUCTION(VREG, vabsI_masked_rule, c) > } > if( STATE__VALID_CHILD(_kids[0], VREG) && _kids[1] == NULL && <---- 1 > ( UseSVE > 0) ) > { > unsigned int c = _kids[0]->_cost[VREG] + SVE_COST; > if (STATE__NOT_YET_VALID(VREG) || _cost[VREG] > c) { > DFA_PRODUCTION(VREG, vabsI_rule, c) > } > } > ... > > > We can see that the constraint at line 1 cannot be matched for > predicated AbsVI node now. > > The main updates are made in adlc/dfa part. Ideally, we should only > add the extra check for affected platforms, i.e. AVX-512 and SVE. But we > didn't do that because it would be better not to introduce any > architecture dependent implementation here. > > Besides, workarounds in both ~aarch64_sve.ad~aarch64_vector.ad and x86.ad are removed. 1) > Many "is_predicated_vector()" checks can be removed in ~aarch64_sve.ad~aarch64_vector.ad > file. 2) Default instruction cost is used for involving rules in x86.ad > file. > > ~[1]. https://github.com/shqking/jdk/commit/50ec9b19~ > [1]. https://github.com/shqking/jdk/commit/f7d9621e2 This pull request has now been integrated. Changeset: eeb625e7 Author: Hao Sun Committer: Ningsheng Jian URL: https://git.openjdk.org/jdk/commit/eeb625e7095e65e64023cbfe05e579af90f4b638 Stats: 154 lines in 5 files changed: 28 ins; 81 del; 45 mod 8290169: adlc: Improve child constraints for vector unary operations Reviewed-by: eliu, xgong, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/9534 From xgong at openjdk.org Thu Sep 15 01:45:57 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 15 Sep 2022 01:45:57 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> Message-ID: On Wed, 7 Sep 2022 06:01:22 GMT, Xiaohong Gong wrote: >> Recently we found the performance of "`FIRST_NONZERO`" for double type is largely worse than the other types on x86 when `UseAVX=2`. The main reason is the "`VectorCastL2X`" op is not supported by the backend when the dst element type is `T_DOUBLE`. This makes the check of `VectorCast` op fail before intrinsifying "`VectorMask.cast()`" which is used in the >> "`FIRST_NONZERO`" java implementation (see [1]). However, the compiler will not generate the `VectorCast `op for `VectorMask.cast()` if: >> >> 1) the current platform supports the predicated feature >> 2) the element size (in bytes) of the src and dst type is the same >> >> So the check of "`VectorCast`" op is needless for such cases. To fix it, this patch: >> >> 1) limits the specified vector cast op check to vectors >> 2) adds the relative mask cast op check for VectorMask.cast() >> 3) cleans up the unnecessary codes >> >> Here is the performance of "`FIRST_NONZERO`" benchmark [2] on a x86 machine with `UseAVX=2`: >> >> Benchmark (size) Mode Cnt Before After Units >> DoubleMaxVector.FIRST_NONZERO 1024 thrpt 15 49.266 2460.886 ops/ms >> DoubleMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 49.554 1892.223 ops/ms >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java#L770 >> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L246 > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Merge branch 'jdk:master' into JDK-8291600 > - Address review comments > - Add vector cast op check for vector mask for some cases > - Revert the unify changes to vector mask cast > - Merge branch 'jdk:master' into JDK-8291600 > - Fix x86 codegen issue > - Unify VectorMaskCast for all platforms > - Merge branch 'master' into JDK-8291600 > - 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast The GHA failure is not related to this PR. So may I get an approve from someone else? Thanks in advance! ------------- PR: https://git.openjdk.org/jdk/pull/9737 From fgao at openjdk.org Thu Sep 15 02:00:54 2022 From: fgao at openjdk.org (Fei Gao) Date: Thu, 15 Sep 2022 02:00:54 GMT Subject: RFR: 8289422: Fix and re-enable vector conditional move [v4] In-Reply-To: <5S-UTbWsM5a0vpMIwa93xNi9p-1DGIM0bXfBT1UxtPM=.b45231dd-8338-4c1a-a350-b48365186c5f@github.com> References: <6uthI29shZjAeLK-eV3Kxqao06qoa9U9zQ5g_oDLmkI=.3e171aae-2003-46c9-88ac-9a63fecc5d96@github.com> <5S-UTbWsM5a0vpMIwa93xNi9p-1DGIM0bXfBT1UxtPM=.b45231dd-8338-4c1a-a350-b48365186c5f@github.com> Message-ID: On Wed, 14 Sep 2022 12:03:04 GMT, Tobias Hartmann wrote: > Okay, please go ahead and file a follow-up bug then. Sure. I filed a new JBS issue in https://bugs.openjdk.org/browse/JDK-8293833. ------------- PR: https://git.openjdk.org/jdk/pull/9652 From duke at openjdk.org Thu Sep 15 04:37:52 2022 From: duke at openjdk.org (Zhiqiang Zang) Date: Thu, 15 Sep 2022 04:37:52 GMT Subject: RFR: 8281453: New optimization: convert "c-(~x)" into "x+(c+1)" and "~(c-x)" into "x+(-c-1)" [v8] In-Reply-To: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> References: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> Message-ID: <9M1h9knHcrm2lTT9_b4oHl_mRV0PiuFqornvmMtyqN8=.96113ac4-6ac3-4d0d-8e6a-ec3038706db2@github.com> > Similar to `(~x)+c` -> `(c-1)-x` and `~(x+c)` -> `(-c-1)-x` in #6858, we can also introduce similar optimizations for subtraction, `c-(~x)` -> `x+(c+1)` and `~(c-x)` -> `x+(-c-1)`. > > The results of the microbenchmark are as follows: > > Baseline: > Benchmark Mode Cnt Score Error Units > SubIdealCMinusNotX.baselineInt avgt 60 0.504 ? 0.011 ns/op > SubIdealCMinusNotX.baselineLong avgt 60 0.484 ? 0.004 ns/op > SubIdealCMinusNotX.testInt1 avgt 60 0.779 ? 0.004 ns/op > SubIdealCMinusNotX.testInt2 avgt 60 0.896 ? 0.004 ns/op > SubIdealCMinusNotX.testLong1 avgt 60 0.722 ? 0.004 ns/op > SubIdealCMinusNotX.testLong2 avgt 60 0.720 ? 0.005 ns/op > > Patch: > Benchmark Mode Cnt Score Error Units > SubIdealCMinusNotX.baselineInt avgt 60 0.487 ? 0.009 ns/op > SubIdealCMinusNotX.baselineLong avgt 60 0.486 ? 0.009 ns/op > SubIdealCMinusNotX.testInt1 avgt 60 0.372 ? 0.010 ns/op > SubIdealCMinusNotX.testInt2 avgt 60 0.365 ? 0.003 ns/op > SubIdealCMinusNotX.testLong1 avgt 60 0.369 ? 0.004 ns/op > SubIdealCMinusNotX.testLong2 avgt 60 0.399 ? 0.016 ns/op Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - merge master. - merge master. - clean. - merge tests into XXXINodeIdealizationTests - clean. - Merge branch 'master'. - convert ~x into -1-x when ~x is part of Add and Sub. - include bug id. - include a microbenmark. - Convert c-(~x) into x+(c+1) in SubNode and convert ~(c-x) into x+(-c-1) in XorNode. ------------- Changes: https://git.openjdk.org/jdk/pull/7376/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=7376&range=07 Stats: 206 lines in 7 files changed: 194 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/7376.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7376/head:pull/7376 PR: https://git.openjdk.org/jdk/pull/7376 From epeter at openjdk.org Thu Sep 15 07:24:38 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 Sep 2022 07:24:38 GMT Subject: RFR: 8287217: C2: PhaseCCP: remove not visited nodes, prevent type inconsistency Message-ID: **Context:** [JDK-8265973](https://bugs.openjdk.org/browse/JDK-8265973) Fix in Valhalla repository (Tobias @TobiHartmann ). [JDK-8290711](https://bugs.openjdk.org/browse/JDK-8290711) Fix in mainline (Roland @rwestrel ). Tobias' fix is a superset of Rolands. Unfortunately, Tobias gave up his fix once Rolands came up, because they did not have tests that required the superset fix. We now have such a test, where Rolands fix is not sufficient. But Tobias' fix is the solution. So I ported Tobias' fix to mainline. **Analysis:** In this bug, we have two `LoadB` nodes re-pushing themselves to the `igvn.worklist`, without end. This leads to an assert after too many iterations. `PhaseCCP::analyze` is looking at a post-loop. The loop has a memory access, so there is a `null_check`. The data-part of the loop is connected down to Root via this `null_check` (`Phi-> CmpP -> Bool -> If -> IfFalse -> Region -> CallStaticJava uncommon_trap -> ... -> Root`). During `CCP::analyze`, we discover that the memory address is NonNull. So we update the `phase->type(n)` for many of the data-nodes of the loop. During `PhaseCCP::do_transform`, we now traverse recursively up from the root, visiting all reachable nodes. When we visit a node, we store the cached `phase->type(n)` into the node, making the node's type consistent. We traverse up through the `null_check`, through the `uncommon_trap`, and the `If`, to the `Bool` node. `BoolNode::Value` realizes that we can never have Null, and is subsumed by constant `#int:1` (true). This means that the data-part of the loop just lost its connection down to Root. The traversal now also does not reach further than the Bool node which was just subsumed, and hence does not reach the data-part of the loop. This means we have nodes with inconsistent type. Summary: CCP disconnects the last path down to root for a data-loop, because it realizes that a `null_check` will never trap. The disconnected state means the types of the data-loop may be left inconsistent. Right after PhaseCCP, we continue with IGVN. The `LoadB` from that data-part of the loop has `MemNode::Ideal_common` called, which defers its transformation until the type of the address is consistent. However, this is never made consistent, as it is already left inconsistent after PhaseCCP. https://github.com/openjdk/jdk/blob/aa7ccdf44549a52cce9e99f6569097d3343d9ee4/src/hotspot/share/opto/memnode.cpp#L351-L358 Note that we only re-push if there is another node in the worklist - a node that hopefully has something to do with the address. But in our case it is just the two LoadB nodes, which were generated from the same `split_through_phi`. **Solution:** At the end of PhaseCCP, we remove all nodes that were not visited (and may have an inconsistent state). We can do this because we visited all nodes that are still relevant. Rolands fix already made sure that SafePointNodes are visited, such that infinite loops are covered as well. Regression test added. Test suite passed. ------------- Commit messages: - fixed: modified node recording, commented code line, java style of test - 8287217: C2: PhaseCCP: remove not visited nodes, prevent type inconsistency Changes: https://git.openjdk.org/jdk/pull/10250/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10250&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8287217 Stats: 104 lines in 5 files changed: 83 ins; 2 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/10250.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10250/head:pull/10250 PR: https://git.openjdk.org/jdk/pull/10250 From rrich at openjdk.org Thu Sep 15 07:30:44 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 15 Sep 2022 07:30:44 GMT Subject: RFR: 8289925: Shared code shouldn't reference the platform specific method frame::interpreter_frame_last_sp() [v2] In-Reply-To: <5olJdPTm_wQCxU59VeZgTRMp6SW_IeFwtSydqhuvWrg=.681e6127-9e5b-4507-9579-ad9c818a1a84@github.com> References: <-bsqwA9AOikvIYJwIgJ7ms1RaF_OqH-UO3juTGZ-Zzc=.6dc5abda-60a3-4f76-83df-6afab3f3a1db@github.com> <5olJdPTm_wQCxU59VeZgTRMp6SW_IeFwtSydqhuvWrg=.681e6127-9e5b-4507-9579-ad9c818a1a84@github.com> Message-ID: On Fri, 26 Aug 2022 08:10:00 GMT, Dean Long wrote: > > The interpreter allocates a new frame reserving space for the maximum expression stack. For a call it is trimmed to the current size of the stack. Consequently unextended_sp < sp is possible if max. stack is large. > > OK, if I'm not mistaken, aarch64 is doing the same thing. I have had a look at aarch64 now. There the interpreter also pushes a new frame reserving space for the maximum expression stack as on ppc64 but the frame is _not_ trimmed to the current size of the expression stack as on ppc64. So on aarch64 unextended_sp < sp seems to be impossible. > So that could mean that code like Continuation::is_frame_in_continuation() > that uses unextended_sp is already broken on some platforms. For example, if > an interpreted frame in the carrier thread had a very large max_stack, then > is_frame_in_continuation() could return true when it should return false. It is likely not yet broken but for ppc64 it will not work. The usage of unextended_sp in shared code is problematic because there is no shared specification for it. Looking at the callers of is_sp_in_continuation() we find the the following is passed as parameter `sp`: 1. frame::interpreter_frame_last_sp() 2. frame::unextended_sp() 3. frame::sp() - 2 4. frame::sp() Where 3. is a special case for continuation entry frames (see Continuation::get_continuation_entry_for_entry_frame()). -2 is needed because is_sp_in_continuation() would return false for the sp of the entry frame. I'd like to change the callers of is_sp_in_continuation() ```c++ static inline bool is_sp_in_continuation(const ContinuationEntry* entry, intptr_t* const sp) { return entry->entry_sp() > sp; } to pass the actual sp. This is correct because the following is true on all platforms: ```c++ a.sp() > E->entry_sp() > b.sp() > c.sp() where `a`, `b`, `c` are stack frames in call order and `E` is a ContinuationEntry. `a` is the caller frame of the continuation entry frame that corresponds to `E`. is_sp_in_continuation() will then return true for `b.sp()` and `c.sp()` and false for `a.sp()`. At least on ppc64 is_sp_in_continuation() can return true for `a.unextended_sp()` because of the frame trimming described above. That's why it should not be used. frame::interpreter_frame_last_sp() should not be used as it is not declared and specified as a shared method. ------------- PR: https://git.openjdk.org/jdk/pull/9411 From rrich at openjdk.org Thu Sep 15 07:43:23 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 15 Sep 2022 07:43:23 GMT Subject: RFR: 8289925: Shared code shouldn't reference the platform specific method frame::interpreter_frame_last_sp() [v3] In-Reply-To: References: Message-ID: <-CdoduIsZBNl7Hqje87jdrC9NbAiG4I-lNzsWqTItD4=.7871424a-5d99-4986-b72b-4c187dd5c2a9@github.com> > The method `frame::interpreter_frame_last_sp()` is a platform method in the sense that it is not declared in a shared header file. It is declared and defined on some platforms though (x86 and aarch64 I think). > > `frame::interpreter_frame_last_sp()` existed on these platforms before vm continuations (aka loom). Shared code that is part of the vm continuations implementation references it. This breaks the platform abstraction. > > This fix simply removes the special case for interpreted frames in the shared method `Continuation::continuation_bottom_sender()`. I cannot see a reason for the distinction between interpreted and compiled frames. The shared code reference to `frame::interpreter_frame_last_sp()` is thereby eliminated. > > Testing: hotspot_loom and jdk_loom on x86_64 and aarch64. Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Only pass the actual sp when calling is_sp_in_continuation() - Merge branch 'master' - Merge branch 'master' - Remove platform dependent method interpreter_frame_last_sp() from shared code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9411/files - new: https://git.openjdk.org/jdk/pull/9411/files/c3ad382c..fdb14090 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9411&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9411&range=01-02 Stats: 163416 lines in 2548 files changed: 79922 ins; 68601 del; 14893 mod Patch: https://git.openjdk.org/jdk/pull/9411.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9411/head:pull/9411 PR: https://git.openjdk.org/jdk/pull/9411 From jbhateja at openjdk.org Thu Sep 15 07:49:08 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 15 Sep 2022 07:49:08 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> Message-ID: On Wed, 7 Sep 2022 06:01:22 GMT, Xiaohong Gong wrote: >> Recently we found the performance of "`FIRST_NONZERO`" for double type is largely worse than the other types on x86 when `UseAVX=2`. The main reason is the "`VectorCastL2X`" op is not supported by the backend when the dst element type is `T_DOUBLE`. This makes the check of `VectorCast` op fail before intrinsifying "`VectorMask.cast()`" which is used in the >> "`FIRST_NONZERO`" java implementation (see [1]). However, the compiler will not generate the `VectorCast `op for `VectorMask.cast()` if: >> >> 1) the current platform supports the predicated feature >> 2) the element size (in bytes) of the src and dst type is the same >> >> So the check of "`VectorCast`" op is needless for such cases. To fix it, this patch: >> >> 1) limits the specified vector cast op check to vectors >> 2) adds the relative mask cast op check for VectorMask.cast() >> 3) cleans up the unnecessary codes >> >> Here is the performance of "`FIRST_NONZERO`" benchmark [2] on a x86 machine with `UseAVX=2`: >> >> Benchmark (size) Mode Cnt Before After Units >> DoubleMaxVector.FIRST_NONZERO 1024 thrpt 15 49.266 2460.886 ops/ms >> DoubleMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 49.554 1892.223 ops/ms >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java#L770 >> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L246 > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Merge branch 'jdk:master' into JDK-8291600 > - Address review comments > - Add vector cast op check for vector mask for some cases > - Revert the unify changes to vector mask cast > - Merge branch 'jdk:master' into JDK-8291600 > - Fix x86 codegen issue > - Unify VectorMaskCast for all platforms > - Merge branch 'master' into JDK-8291600 > - 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast Thanks @XiaohongGong for looking into this, patch is showing significant speedups for AVX2 and AVX512 KNL targets. src/hotspot/share/opto/vectorIntrinsics.cpp line 2486: > 2484: ((src_type->isa_vectmask() == NULL && dst_type->isa_vectmask()) || > 2485: (dst_type->isa_vectmask() == NULL && src_type->isa_vectmask()) || > 2486: num_elem_from != num_elem_to)) { This check is already done on [java side](https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Double512Vector.java#L643). src/hotspot/share/opto/vectorIntrinsics.cpp line 2505: > 2503: bool no_vec_cast_check = is_mask && > 2504: ((src_type->isa_vectmask() && dst_type->isa_vectmask()) || > 2505: type2aelembytes(elem_bt_from) == type2aelembytes(elem_bt_to)); Same check is existing at https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L2551, can we do some factoring. src/hotspot/share/opto/vectorIntrinsics.cpp line 2554: > 2552: op = gvn().transform(new VectorReinterpretNode(op, src_type, resize_type)); > 2553: op = gvn().transform(VectorCastNode::make(cast_vopc, op, elem_bt_to, num_elem_to)); > 2554: } else { // num_elem_from == num_elem_to There is one comment unrelated to this patch but since your patch touches this function may be we can address it. Call to VectorMaskCastNode::makeCastNode in else block at following line https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L2555 should be able to handle the true block. But, newly created IR node on following location https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectornode.cpp#L1761 should be passed through gvn transform before returning. ------------- PR: https://git.openjdk.org/jdk/pull/9737 From rrich at openjdk.org Thu Sep 15 07:49:38 2022 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 15 Sep 2022 07:49:38 GMT Subject: RFR: 8289925: Shared code shouldn't reference the platform specific method frame::interpreter_frame_last_sp() [v3] In-Reply-To: <-CdoduIsZBNl7Hqje87jdrC9NbAiG4I-lNzsWqTItD4=.7871424a-5d99-4986-b72b-4c187dd5c2a9@github.com> References: <-CdoduIsZBNl7Hqje87jdrC9NbAiG4I-lNzsWqTItD4=.7871424a-5d99-4986-b72b-4c187dd5c2a9@github.com> Message-ID: <3TRqlNrdNKvq5h70pg133Z3Fc2FNcTGccRZP7fYThMw=.89808b99-40c6-4dc4-9186-d3750b062636@github.com> On Thu, 15 Sep 2022 07:43:23 GMT, Richard Reingruber wrote: >> The method `frame::interpreter_frame_last_sp()` is a platform method in the sense that it is not declared in a shared header file. It is declared and defined on some platforms though (x86 and aarch64 I think). >> >> `frame::interpreter_frame_last_sp()` existed on these platforms before vm continuations (aka loom). Shared code that is part of the vm continuations implementation references it. This breaks the platform abstraction. >> >> This fix simply removes the special case for interpreted frames in the shared method `Continuation::continuation_bottom_sender()`. I cannot see a reason for the distinction between interpreted and compiled frames. The shared code reference to `frame::interpreter_frame_last_sp()` is thereby eliminated. >> >> Testing: hotspot_loom and jdk_loom on x86_64 and aarch64. > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Only pass the actual sp when calling is_sp_in_continuation() > - Merge branch 'master' > - Merge branch 'master' > - Remove platform dependent method interpreter_frame_last_sp() from shared code hotspot_loom and jdk_loom tests on x86_64 and aarch64 still pass with the last commit https://github.com/openjdk/jdk/pull/9411/commits/fdb14090e74e8adb9046c10065f03e4e46e09f1c. Tests with the ppc64 loom port succeeded as well. ------------- PR: https://git.openjdk.org/jdk/pull/9411 From xgong at openjdk.org Thu Sep 15 07:53:55 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 15 Sep 2022 07:53:55 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> Message-ID: <5tyebiAm4_H2BMiKwnP0V6O3z588UECw24Z1ynSWJzY=.a86b56e0-1ef1-4e60-bf4e-d5a0aee2c61c@github.com> On Thu, 15 Sep 2022 07:16:39 GMT, Jatin Bhateja wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - Merge branch 'jdk:master' into JDK-8291600 >> - Address review comments >> - Add vector cast op check for vector mask for some cases >> - Revert the unify changes to vector mask cast >> - Merge branch 'jdk:master' into JDK-8291600 >> - Fix x86 codegen issue >> - Unify VectorMaskCast for all platforms >> - Merge branch 'master' into JDK-8291600 >> - 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2554: > >> 2552: op = gvn().transform(new VectorReinterpretNode(op, src_type, resize_type)); >> 2553: op = gvn().transform(VectorCastNode::make(cast_vopc, op, elem_bt_to, num_elem_to)); >> 2554: } else { // num_elem_from == num_elem_to > > There is one comment unrelated to this patch but since your patch touches this function may be we can address it. > Call to VectorMaskCastNode::makeCastNode in else block at following line > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L2555 > should be able to handle the true block. > But, newly created IR node on following location https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectornode.cpp#L1761 should be passed through gvn transform before returning. Yes, you are right. That's also what is the first version of this PR (please see: https://openjdk.github.io/cr/?repo=jdk&pr=9737&range=00). And since @merykitty suggested to unify the whole vector mask cast operations with `VectorMaskCastNode`, I prepare a follow-up patch to specifically do this refactorization (please see: https://github.com/openjdk/jdk/pull/10192). And as suggested by @DamonFool , we only fix the `FIRST_NONZERO` performance issue in this PR. ------------- PR: https://git.openjdk.org/jdk/pull/9737 From xgong at openjdk.org Thu Sep 15 07:56:50 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 15 Sep 2022 07:56:50 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> Message-ID: On Thu, 15 Sep 2022 07:08:48 GMT, Jatin Bhateja wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - Merge branch 'jdk:master' into JDK-8291600 >> - Address review comments >> - Add vector cast op check for vector mask for some cases >> - Revert the unify changes to vector mask cast >> - Merge branch 'jdk:master' into JDK-8291600 >> - Fix x86 codegen issue >> - Unify VectorMaskCast for all platforms >> - Merge branch 'master' into JDK-8291600 >> - 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2505: > >> 2503: bool no_vec_cast_check = is_mask && >> 2504: ((src_type->isa_vectmask() && dst_type->isa_vectmask()) || >> 2505: type2aelembytes(elem_bt_from) == type2aelembytes(elem_bt_to)); > > Same check is existing at https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L2551, can we do some factoring. Yes, we can. But consider these codes will be optimized out in the follow-up unify patch (see https://github.com/openjdk/jdk/pull/10192), do you think it's necessary to do some simple factorization here? If it's necessary, I can change the codes. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/9737 From xgong at openjdk.org Thu Sep 15 08:04:05 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 15 Sep 2022 08:04:05 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> Message-ID: On Thu, 15 Sep 2022 06:28:01 GMT, Jatin Bhateja wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - Merge branch 'jdk:master' into JDK-8291600 >> - Address review comments >> - Add vector cast op check for vector mask for some cases >> - Revert the unify changes to vector mask cast >> - Merge branch 'jdk:master' into JDK-8291600 >> - Fix x86 codegen issue >> - Unify VectorMaskCast for all platforms >> - Merge branch 'master' into JDK-8291600 >> - 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2486: > >> 2484: ((src_type->isa_vectmask() == NULL && dst_type->isa_vectmask()) || >> 2485: (dst_type->isa_vectmask() == NULL && src_type->isa_vectmask()) || >> 2486: num_elem_from != num_elem_to)) { > > This check is already done on [java side](https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Double512Vector.java#L643). Yes, it is. Just want to double check in the compiler side, or it maybe confusing if someone is not familiar with the vector api java codes. ------------- PR: https://git.openjdk.org/jdk/pull/9737 From jbhateja at openjdk.org Thu Sep 15 08:04:05 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 15 Sep 2022 08:04:05 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: <5tyebiAm4_H2BMiKwnP0V6O3z588UECw24Z1ynSWJzY=.a86b56e0-1ef1-4e60-bf4e-d5a0aee2c61c@github.com> References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> <5tyebiAm4_H2BMiKwnP0V6O3z588UECw24Z1ynSWJzY=.a86b56e0-1ef1-4e60-bf4e-d5a0aee2c61c@github.com> Message-ID: <3qhiPdpNzDOgD9sPqfEN_-7U5Izo48lFgFQVyu7ctbM=.dd50124e-a96c-46f3-ad7f-eb920f23861b@github.com> On Thu, 15 Sep 2022 07:51:34 GMT, Xiaohong Gong wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 2554: >> >>> 2552: op = gvn().transform(new VectorReinterpretNode(op, src_type, resize_type)); >>> 2553: op = gvn().transform(VectorCastNode::make(cast_vopc, op, elem_bt_to, num_elem_to)); >>> 2554: } else { // num_elem_from == num_elem_to >> >> There is one comment unrelated to this patch but since your patch touches this function may be we can address it. >> Call to VectorMaskCastNode::makeCastNode in else block at following line >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L2555 >> should be able to handle the true block. >> But, newly created IR node on following location https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectornode.cpp#L1761 should be passed through gvn transform before returning. > > Yes, you are right. That's also what is the first version of this PR (please see: https://openjdk.github.io/cr/?repo=jdk&pr=9737&range=00). And since @merykitty suggested to unify the whole vector mask cast operations with `VectorMaskCastNode`, I prepare a follow-up patch to specifically do this refactorization (please see: https://github.com/openjdk/jdk/pull/10192). And as suggested by @DamonFool , we only fix the `FIRST_NONZERO` performance issue in this PR. Yes, I just noticed that #10192 is addressing above two comments ------------- PR: https://git.openjdk.org/jdk/pull/9737 From xgong at openjdk.org Thu Sep 15 08:04:05 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 15 Sep 2022 08:04:05 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: <3qhiPdpNzDOgD9sPqfEN_-7U5Izo48lFgFQVyu7ctbM=.dd50124e-a96c-46f3-ad7f-eb920f23861b@github.com> References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> <5tyebiAm4_H2BMiKwnP0V6O3z588UECw24Z1ynSWJzY=.a86b56e0-1ef1-4e60-bf4e-d5a0aee2c61c@github.com> <3qhiPdpNzDOgD9sPqfEN_-7U5Izo48lFgFQVyu7ctbM=.dd50124e-a96c-46f3-ad7f-eb920f23861b@github.com> Message-ID: On Thu, 15 Sep 2022 07:59:22 GMT, Jatin Bhateja wrote: >> Yes, you are right. That's also what is the first version of this PR (please see: https://openjdk.github.io/cr/?repo=jdk&pr=9737&range=00). And since @merykitty suggested to unify the whole vector mask cast operations with `VectorMaskCastNode`, I prepare a follow-up patch to specifically do this refactorization (please see: https://github.com/openjdk/jdk/pull/10192). And as suggested by @DamonFool , we only fix the `FIRST_NONZERO` performance issue in this PR. > > Yes, I just noticed that #10192 is addressing above two comments Yeah, thanks so much for looking at that PR as well! ------------- PR: https://git.openjdk.org/jdk/pull/9737 From jbhateja at openjdk.org Thu Sep 15 08:12:49 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 15 Sep 2022 08:12:49 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> Message-ID: On Thu, 15 Sep 2022 07:57:37 GMT, Xiaohong Gong wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 2486: >> >>> 2484: ((src_type->isa_vectmask() == NULL && dst_type->isa_vectmask()) || >>> 2485: (dst_type->isa_vectmask() == NULL && src_type->isa_vectmask()) || >>> 2486: num_elem_from != num_elem_to)) { >> >> This check is already done on [java side](https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Double512Vector.java#L643). > > Yes, it is. Just want to double check in the compiler side, or it maybe confusing if someone is not familiar with the vector api java codes. I see, may be we can just add a comment/assert instead. ------------- PR: https://git.openjdk.org/jdk/pull/9737 From tholenstein at openjdk.org Thu Sep 15 08:34:14 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 15 Sep 2022 08:34:14 GMT Subject: RFR: JDK-8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs [v2] In-Reply-To: References: Message-ID: > Refactor the Actions in EditorTopComponent (com.sun.hotspot.igv.view.actions). Move Action specific code from EditorTopComponent to the corresponding Action. > > # Refactoring of com.sun.hotspot.igv.view.actions and EditorTopComponent > - Created a new `ExportGraph` Action and moved corresponding functions `exportToSVG(..)` and `exportToPDF(..)` to new `ExportGraph.java` > - Moved key bindings for satellite-view (pressing S) from `EditorTopComponent` to `OverviewAction.java` > - Moved Action specific code from `EditorTopComponent` to the corresponding `XXXAction.java` > > # Fixing minor Bugs > - "Show empty blocks in control-flow graph view" is selected by default but only enabled in CFG view. > This is distracting for the eye when we are not in CFG: > cfg_before > Now "Show empty blocks in control-flow graph view" is not selected anymore when disabled (greyed out) > cfg_node_disable > But still gets selected by default when enabled > cfg_now > > - "Extract current set of selected nodes", "Hide selected nodes" and "show all nodes" were always enabled, even when they didn't effect anything. > selection_before > Now "Extract current set of selected nodes", "Hide selected nodes" are disabled (greyed out) when no nodes are selected. And "show all nodes" is disabled (greyed out) when all nodes are already visible. > selection_now > > - "Reduce the difference selection" got stuck when at the last graphs in the group because it got greyed out. > reduce_stuck > duce the difference selection" > Now "Reduce the difference selection" works as expected: > reduce_now Tobias Holenstein has updated the pull request incrementally with six additional commits since the last revision: - removed unused CTL_ - ActionRegistration for ExportAction, ZoomInAction and ZoomOutAction - use recommended ActionRegistration for ContextActions - new addContextListener / removeContextListener in ContextAction - refactor ContextAction - correct package for CustomSelectAction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10170/files - new: https://git.openjdk.org/jdk/pull/10170/files/7bcf79df..60e55cc7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=00-01 Stats: 677 lines in 19 files changed: 214 ins; 271 del; 192 mod Patch: https://git.openjdk.org/jdk/pull/10170.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10170/head:pull/10170 PR: https://git.openjdk.org/jdk/pull/10170 From xgong at openjdk.org Thu Sep 15 08:41:47 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 15 Sep 2022 08:41:47 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> Message-ID: On Thu, 15 Sep 2022 08:06:35 GMT, Jatin Bhateja wrote: >> Yes, it is. Just want to double check in the compiler side, or it maybe confusing if someone is not familiar with the vector api java codes. > > I see, may be we can just add a comment/assert instead. Yeah, agree to you. Or maybe as assert under `if(num_elem_from < num_elem_to)` and `if(num_elem_from > num_elem_to)` ? I'm sorry that I didn't find a better place to add the comment. Any suggestions? For me, it's better to add the comment inside the `if(is_cast)` block since it has different paths based on the `num_elem_from` and `num_elem_to`. ------------- PR: https://git.openjdk.org/jdk/pull/9737 From xgong at openjdk.org Thu Sep 15 08:41:48 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 15 Sep 2022 08:41:48 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> Message-ID: On Thu, 15 Sep 2022 08:19:17 GMT, Xiaohong Gong wrote: >> I see, may be we can just add a comment/assert instead. > > Yeah, agree to you. Or maybe as assert under `if(num_elem_from < num_elem_to)` and `if(num_elem_from > num_elem_to)` ? I'm sorry that I didn't find a better place to add the comment. Any suggestions? For me, it's better to add the comment inside the `if(is_cast)` block since it has different paths based on the `num_elem_from` and `num_elem_to`. Something like: `assert(!is_mask, "mask cast needs the same elem num ")` ------------- PR: https://git.openjdk.org/jdk/pull/9737 From xgong at openjdk.org Thu Sep 15 08:41:49 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 15 Sep 2022 08:41:49 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> Message-ID: On Thu, 15 Sep 2022 08:22:11 GMT, Xiaohong Gong wrote: >> Yeah, agree to you. Or maybe as assert under `if(num_elem_from < num_elem_to)` and `if(num_elem_from > num_elem_to)` ? I'm sorry that I didn't find a better place to add the comment. Any suggestions? For me, it's better to add the comment inside the `if(is_cast)` block since it has different paths based on the `num_elem_from` and `num_elem_to`. > > Something like: `assert(!is_mask, "mask cast needs the same elem num ")` Hi, I'd like the add the assertion in the unify patch instead of this. So I will revert this change here. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/9737 From jbhateja at openjdk.org Thu Sep 15 08:41:49 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 15 Sep 2022 08:41:49 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> Message-ID: <5T0aX9ytUUZgk06vu06bjFURBMtxr7C53ytVtHwZAnU=.f6f03f31-5986-49d3-be66-cda08f8868a6@github.com> On Thu, 15 Sep 2022 08:35:58 GMT, Xiaohong Gong wrote: >> Something like: `assert(!is_mask, "mask cast needs the same elem num ")` > > Hi, I'd like the add the assertion in the unify patch instead of this. So I will revert this change here. Thanks! Yes, assert (!is_mask || (is_mask && num_elem_from == num_elem_to), "mask cast needs the same elem num "). ------------- PR: https://git.openjdk.org/jdk/pull/9737 From xgong at openjdk.org Thu Sep 15 08:41:50 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 15 Sep 2022 08:41:50 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: <5T0aX9ytUUZgk06vu06bjFURBMtxr7C53ytVtHwZAnU=.f6f03f31-5986-49d3-be66-cda08f8868a6@github.com> References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> <5T0aX9ytUUZgk06vu06bjFURBMtxr7C53ytVtHwZAnU=.f6f03f31-5986-49d3-be66-cda08f8868a6@github.com> Message-ID: On Thu, 15 Sep 2022 08:36:02 GMT, Jatin Bhateja wrote: >> Hi, I'd like the add the assertion in the unify patch instead of this. So I will revert this change here. Thanks! > > Yes, assert (!is_mask || (is_mask && num_elem_from == num_elem_to), "mask cast needs the same elem num "). Looks good. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/9737 From shade at openjdk.org Thu Sep 15 08:58:50 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 Sep 2022 08:58:50 GMT Subject: RFR: 8293844: C2: Verify Location::{oop,normal} types in PhaseOutput::FillLocArray Message-ID: I have been debugging a weird issue in C2/deopt, and wanted to have stronger asserts in critical paths. One such place is `PhaseOutput::FillLocArray`, which emits `Location::normal` on unconditional `else` branch. `Location::normal` is described as "Ints, floats, double halves". I think we would be better off verifying the types explicitly. Same goes for `Location::oop`, which we can also verify. Aside: In fact, I suspect the whole `Regalloc::is_oop` business can go away, and we can rely on reg types to sense if we are dealing with oops here, but that looks like a change with some unexpected effects, so I would like to do that separately, see [JDK-8293845](https://bugs.openjdk.org/browse/JDK-8293845). Additional testing: - [x] Linux x86_64 fastdebug `tier1` - [x] Linux x86_64 fastdebug `tier2` - [x] Linux x86_32 fastdebug `tier1` - [x] Linux x86_32 fastdebug `tier2` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/10281/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10281&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293844 Stats: 9 lines in 1 file changed: 8 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10281.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10281/head:pull/10281 PR: https://git.openjdk.org/jdk/pull/10281 From tholenstein at openjdk.org Thu Sep 15 08:59:23 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 15 Sep 2022 08:59:23 GMT Subject: RFR: JDK-8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs [v3] In-Reply-To: References: Message-ID: > Refactor the Actions in EditorTopComponent (com.sun.hotspot.igv.view.actions). Move Action specific code from EditorTopComponent to the corresponding Action. > > # Refactoring of com.sun.hotspot.igv.view.actions and EditorTopComponent > - Created a new `ExportGraph` Action and moved corresponding functions `exportToSVG(..)` and `exportToPDF(..)` to new `ExportGraph.java` > - Moved key bindings for satellite-view (pressing S) from `EditorTopComponent` to `OverviewAction.java` > - Moved Action specific code from `EditorTopComponent` to the corresponding `XXXAction.java` > > # Fixing minor Bugs > - "Show empty blocks in control-flow graph view" is selected by default but only enabled in CFG view. > This is distracting for the eye when we are not in CFG: > cfg_before > Now "Show empty blocks in control-flow graph view" is not selected anymore when disabled (greyed out) > cfg_node_disable > But still gets selected by default when enabled > cfg_now > > - "Extract current set of selected nodes", "Hide selected nodes" and "show all nodes" were always enabled, even when they didn't effect anything. > selection_before > Now "Extract current set of selected nodes", "Hide selected nodes" are disabled (greyed out) when no nodes are selected. And "show all nodes" is disabled (greyed out) when all nodes are already visible. > selection_now > > - "Reduce the difference selection" got stuck when at the last graphs in the group because it got greyed out. > reduce_stuck > duce the difference selection" > Now "Reduce the difference selection" works as expected: > reduce_now Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: undo double init of Toolbar ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10170/files - new: https://git.openjdk.org/jdk/pull/10170/files/60e55cc7..e2bb62ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=01-02 Stats: 57 lines in 1 file changed: 13 ins; 30 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/10170.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10170/head:pull/10170 PR: https://git.openjdk.org/jdk/pull/10170 From duke at openjdk.org Thu Sep 15 09:02:14 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Thu, 15 Sep 2022 09:02:14 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v4] In-Reply-To: References: Message-ID: > The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. > > Testing: Manually run the test case in the JBS and look at the compiled code. > > I also do some small clean-ups in x86_64.ad: > > - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. > - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. > - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed > - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. > - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. > > Please kindly review, thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: address reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9791/files - new: https://git.openjdk.org/jdk/pull/9791/files/e7c79d4f..a56c3e53 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9791&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9791&range=02-03 Stats: 24 lines in 2 files changed: 16 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/9791.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9791/head:pull/9791 PR: https://git.openjdk.org/jdk/pull/9791 From tholenstein at openjdk.org Thu Sep 15 09:03:55 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 15 Sep 2022 09:03:55 GMT Subject: RFR: JDK-8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs [v4] In-Reply-To: References: Message-ID: > Refactor the Actions in EditorTopComponent (com.sun.hotspot.igv.view.actions). Move Action specific code from EditorTopComponent to the corresponding Action. > > # Refactoring of com.sun.hotspot.igv.view.actions and EditorTopComponent > - Created a new `ExportGraph` Action and moved corresponding functions `exportToSVG(..)` and `exportToPDF(..)` to new `ExportGraph.java` > - Moved key bindings for satellite-view (pressing S) from `EditorTopComponent` to `OverviewAction.java` > - Moved Action specific code from `EditorTopComponent` to the corresponding `XXXAction.java` > > # Fixing minor Bugs > - "Show empty blocks in control-flow graph view" is selected by default but only enabled in CFG view. > This is distracting for the eye when we are not in CFG: > cfg_before > Now "Show empty blocks in control-flow graph view" is not selected anymore when disabled (greyed out) > cfg_node_disable > But still gets selected by default when enabled > cfg_now > > - "Extract current set of selected nodes", "Hide selected nodes" and "show all nodes" were always enabled, even when they didn't effect anything. > selection_before > Now "Extract current set of selected nodes", "Hide selected nodes" are disabled (greyed out) when no nodes are selected. And "show all nodes" is disabled (greyed out) when all nodes are already visible. > selection_now > > - "Reduce the difference selection" got stuck when at the last graphs in the group because it got greyed out. > reduce_stuck > duce the difference selection" > Now "Reduce the difference selection" works as expected: > reduce_now Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update OutlineTopComponent.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10170/files - new: https://git.openjdk.org/jdk/pull/10170/files/e2bb62ed..65e06517 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10170.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10170/head:pull/10170 PR: https://git.openjdk.org/jdk/pull/10170 From tholenstein at openjdk.org Thu Sep 15 09:06:25 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 15 Sep 2022 09:06:25 GMT Subject: RFR: JDK-8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs [v5] In-Reply-To: References: Message-ID: > Refactor the Actions in EditorTopComponent (com.sun.hotspot.igv.view.actions). Move Action specific code from EditorTopComponent to the corresponding Action. > > # Refactoring of com.sun.hotspot.igv.view.actions and EditorTopComponent > - Created a new `ExportGraph` Action and moved corresponding functions `exportToSVG(..)` and `exportToPDF(..)` to new `ExportGraph.java` > - Moved key bindings for satellite-view (pressing S) from `EditorTopComponent` to `OverviewAction.java` > - Moved Action specific code from `EditorTopComponent` to the corresponding `XXXAction.java` > > # Fixing minor Bugs > - "Show empty blocks in control-flow graph view" is selected by default but only enabled in CFG view. > This is distracting for the eye when we are not in CFG: > cfg_before > Now "Show empty blocks in control-flow graph view" is not selected anymore when disabled (greyed out) > cfg_node_disable > But still gets selected by default when enabled > cfg_now > > - "Extract current set of selected nodes", "Hide selected nodes" and "show all nodes" were always enabled, even when they didn't effect anything. > selection_before > Now "Extract current set of selected nodes", "Hide selected nodes" are disabled (greyed out) when no nodes are selected. And "show all nodes" is disabled (greyed out) when all nodes are already visible. > selection_now > > - "Reduce the difference selection" got stuck when at the last graphs in the group because it got greyed out. > reduce_stuck > duce the difference selection" > Now "Reduce the difference selection" works as expected: > reduce_now Tobias Holenstein has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: - Update Copyright year - Update OutlineTopComponent.java Revert "Update OutlineTopComponent.java" This reverts commit 65e0651730983e12c032bb89564c3ef93aa34dbe. revert whitespace change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10170/files - new: https://git.openjdk.org/jdk/pull/10170/files/65e06517..23effacd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=03-04 Stats: 5 lines in 2 files changed: 1 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10170.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10170/head:pull/10170 PR: https://git.openjdk.org/jdk/pull/10170 From duke at openjdk.org Thu Sep 15 09:17:07 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Thu, 15 Sep 2022 09:17:07 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v5] In-Reply-To: References: Message-ID: > The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. > > Testing: Manually run the test case in the JBS and look at the compiled code. > > I also do some small clean-ups in x86_64.ad: > > - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. > - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. > - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed > - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. > - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. > > Please kindly review, thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: divL_10 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9791/files - new: https://git.openjdk.org/jdk/pull/9791/files/a56c3e53..ef4f3cf9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9791&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9791&range=03-04 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9791.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9791/head:pull/9791 PR: https://git.openjdk.org/jdk/pull/9791 From duke at openjdk.org Thu Sep 15 09:18:48 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Thu, 15 Sep 2022 09:18:48 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v2] In-Reply-To: References: Message-ID: On Wed, 14 Sep 2022 16:33:41 GMT, Sandhya Viswanathan wrote: >> @sviswa7 Thanks a lot for your review, I have reverted that change. I don't understand why, though, it does not seem that the bottleneck is in the predecoder. > > @merykitty Thanks for reverting those changes. Could you please also add jmh tests for the following: > 1) AndL with 255 > 2) AndL with 65535 > 3) DivL by 10 > For 1) and 2) we are changing the instruction from q version to l version, so want to make sure the performance is at par atleast. > For 3) it will be good to check that the compiler is optimizing divide by 10 for long data type as well now. @sviswa7 Thanks for your reviews, I have addressed those in the last commits, the benchmark results are as follow: Before After Benchmark Mode Cnt Score Error Score Error Units BasicRules.add_mem_con avgt 15 203.470 ? 8.955 202.771 ? 2.867 ns/op BasicRules.andL_rReg_imm255 avgt 15 183.892 ? 2.314 185.008 ? 2.642 ns/op BasicRules.andL_rReg_imm65535 avgt 15 183.854 ? 2.293 184.849 ? 2.668 ns/op BasicRules.divL_10 avgt 15 643.210 ? 10.536 645.040 ? 10.014 ns/op ------------- PR: https://git.openjdk.org/jdk/pull/9791 From tholenstein at openjdk.org Thu Sep 15 09:29:40 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 15 Sep 2022 09:29:40 GMT Subject: RFR: JDK-8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs [v5] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 13:16:37 GMT, Roberto Casta?eda Lozano wrote: > This changeset seems to disable the keyboard shortcuts for `Extract`, `Show all nodes`, and `Hide` right after a graph is opened. Interestingly, after clicking around for a while, the keyboard shortcuts start working again. Please let me know if you need more details to reproduce the problem, hopefully it is reproducible in other platforms than my own (Ubuntu 20.04). Thanks for spotting that @robcasloz! Defining the `Shortcuts` in `layer.xml` is not recommended anymore and cause problems with `ContextAction`. The more modern approach is to use `@ActionXXX` annotations and define the shortcuts there. Also, `ContextAction` listened only to `getDiagramChangedEvent()`. I changed them to also listened to `getViewChangedEvent()`, `getViewPropertiesChangedEvent()` and `getHiddenNodesChangedEvent()`. Otherwise the context aware action missed being updated when node selection or view changed. I updated to PR description accordingly ------------- PR: https://git.openjdk.org/jdk/pull/10170 From duke at openjdk.org Thu Sep 15 09:40:38 2022 From: duke at openjdk.org (bell-sw) Date: Thu, 15 Sep 2022 09:40:38 GMT Subject: RFR: 8293695: Implement isInfinite intrinsic for RISC-V In-Reply-To: <8-TCWhiO_DkKS10Fkl7gztIBGeGJHO45F2d99YVKAvQ=.8e10e678-d320-4945-8760-ad48eefbdd88@github.com> References: <8-TCWhiO_DkKS10Fkl7gztIBGeGJHO45F2d99YVKAvQ=.8e10e678-d320-4945-8760-ad48eefbdd88@github.com> Message-ID: On Tue, 13 Sep 2022 15:48:38 GMT, Aleksei Voitylov wrote: > RISC-V 64 intrinsic for isInfinite follows the logic of x86 intrinsic (introduced by 8285868). This patch adds C2 match for IsInfinite nodes. Existing test is modified to run on RISC-V and passes on both release and fastdebug builds. Benchmark results are below: > > before: > Benchmark Mode Cnt Score Error Units > DoubleClassCheck.testIsInfiniteBranch avgt 15 43.547 ? 6.843 ns/op > DoubleClassCheck.testIsInfiniteCMov avgt 15 16.301 ? 1.386 ns/op > DoubleClassCheck.testIsInfiniteStore avgt 15 16.230 ? 1.477 ns/op > FloatClassCheck.testIsInfiniteBranch avgt 15 38.774 ? 3.572 ns/op > FloatClassCheck.testIsInfiniteCMov avgt 15 15.064 ? 1.310 ns/op > FloatClassCheck.testIsInfiniteStore avgt 15 14.967 ? 1.298 ns/op > > after: > Benchmark Mode Cnt Score Error Units > DoubleClassCheck.testIsInfiniteBranch avgt 15 39.987 ? 6.179 ns/op > DoubleClassCheck.testIsInfiniteCMov avgt 15 13.477 ? 1.159 ns/op > DoubleClassCheck.testIsInfiniteStore avgt 15 9.607 ? 0.834 ns/op > FloatClassCheck.testIsInfiniteBranch avgt 15 36.265 ? 3.168 ns/op > FloatClassCheck.testIsInfiniteCMov avgt 15 13.230 ? 1.100 ns/op > FloatClassCheck.testIsInfiniteStore avgt 15 9.492 ? 0.807 ns/op > > According to 8285868 discussion, isNaN and isFinite methods intrinsification using the same approach might be not beneficial. I'm going to investigate it for RISC-V and propose methods intrinsification as part of further work in case it's profitable. Marked as reviewed by bell-sw at github.com (no known OpenJDK username). ------------- PR: https://git.openjdk.org/jdk/pull/10253 From dsamersoff at openjdk.org Thu Sep 15 09:40:39 2022 From: dsamersoff at openjdk.org (Dmitry Samersoff) Date: Thu, 15 Sep 2022 09:40:39 GMT Subject: RFR: 8293695: Implement isInfinite intrinsic for RISC-V In-Reply-To: <8-TCWhiO_DkKS10Fkl7gztIBGeGJHO45F2d99YVKAvQ=.8e10e678-d320-4945-8760-ad48eefbdd88@github.com> References: <8-TCWhiO_DkKS10Fkl7gztIBGeGJHO45F2d99YVKAvQ=.8e10e678-d320-4945-8760-ad48eefbdd88@github.com> Message-ID: On Tue, 13 Sep 2022 15:48:38 GMT, Aleksei Voitylov wrote: > RISC-V 64 intrinsic for isInfinite follows the logic of x86 intrinsic (introduced by 8285868). This patch adds C2 match for IsInfinite nodes. Existing test is modified to run on RISC-V and passes on both release and fastdebug builds. Benchmark results are below: > > before: > Benchmark Mode Cnt Score Error Units > DoubleClassCheck.testIsInfiniteBranch avgt 15 43.547 ? 6.843 ns/op > DoubleClassCheck.testIsInfiniteCMov avgt 15 16.301 ? 1.386 ns/op > DoubleClassCheck.testIsInfiniteStore avgt 15 16.230 ? 1.477 ns/op > FloatClassCheck.testIsInfiniteBranch avgt 15 38.774 ? 3.572 ns/op > FloatClassCheck.testIsInfiniteCMov avgt 15 15.064 ? 1.310 ns/op > FloatClassCheck.testIsInfiniteStore avgt 15 14.967 ? 1.298 ns/op > > after: > Benchmark Mode Cnt Score Error Units > DoubleClassCheck.testIsInfiniteBranch avgt 15 39.987 ? 6.179 ns/op > DoubleClassCheck.testIsInfiniteCMov avgt 15 13.477 ? 1.159 ns/op > DoubleClassCheck.testIsInfiniteStore avgt 15 9.607 ? 0.834 ns/op > FloatClassCheck.testIsInfiniteBranch avgt 15 36.265 ? 3.168 ns/op > FloatClassCheck.testIsInfiniteCMov avgt 15 13.230 ? 1.100 ns/op > FloatClassCheck.testIsInfiniteStore avgt 15 9.492 ? 0.807 ns/op > > According to 8285868 discussion, isNaN and isFinite methods intrinsification using the same approach might be not beneficial. I'm going to investigate it for RISC-V and propose methods intrinsification as part of further work in case it's profitable. Marked as reviewed by dsamersoff (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/10253 From avoitylov at openjdk.org Thu Sep 15 09:48:33 2022 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Thu, 15 Sep 2022 09:48:33 GMT Subject: Integrated: 8293695: Implement isInfinite intrinsic for RISC-V In-Reply-To: <8-TCWhiO_DkKS10Fkl7gztIBGeGJHO45F2d99YVKAvQ=.8e10e678-d320-4945-8760-ad48eefbdd88@github.com> References: <8-TCWhiO_DkKS10Fkl7gztIBGeGJHO45F2d99YVKAvQ=.8e10e678-d320-4945-8760-ad48eefbdd88@github.com> Message-ID: On Tue, 13 Sep 2022 15:48:38 GMT, Aleksei Voitylov wrote: > RISC-V 64 intrinsic for isInfinite follows the logic of x86 intrinsic (introduced by 8285868). This patch adds C2 match for IsInfinite nodes. Existing test is modified to run on RISC-V and passes on both release and fastdebug builds. Benchmark results are below: > > before: > Benchmark Mode Cnt Score Error Units > DoubleClassCheck.testIsInfiniteBranch avgt 15 43.547 ? 6.843 ns/op > DoubleClassCheck.testIsInfiniteCMov avgt 15 16.301 ? 1.386 ns/op > DoubleClassCheck.testIsInfiniteStore avgt 15 16.230 ? 1.477 ns/op > FloatClassCheck.testIsInfiniteBranch avgt 15 38.774 ? 3.572 ns/op > FloatClassCheck.testIsInfiniteCMov avgt 15 15.064 ? 1.310 ns/op > FloatClassCheck.testIsInfiniteStore avgt 15 14.967 ? 1.298 ns/op > > after: > Benchmark Mode Cnt Score Error Units > DoubleClassCheck.testIsInfiniteBranch avgt 15 39.987 ? 6.179 ns/op > DoubleClassCheck.testIsInfiniteCMov avgt 15 13.477 ? 1.159 ns/op > DoubleClassCheck.testIsInfiniteStore avgt 15 9.607 ? 0.834 ns/op > FloatClassCheck.testIsInfiniteBranch avgt 15 36.265 ? 3.168 ns/op > FloatClassCheck.testIsInfiniteCMov avgt 15 13.230 ? 1.100 ns/op > FloatClassCheck.testIsInfiniteStore avgt 15 9.492 ? 0.807 ns/op > > According to 8285868 discussion, isNaN and isFinite methods intrinsification using the same approach might be not beneficial. I'm going to investigate it for RISC-V and propose methods intrinsification as part of further work in case it's profitable. This pull request has now been integrated. Changeset: b31a03c6 Author: Aleksei Voitylov Committer: Dmitry Samersoff URL: https://git.openjdk.org/jdk/commit/b31a03c60a14e32304efe15fcd0031a752f4b4ab Stats: 30 lines in 3 files changed: 26 ins; 0 del; 4 mod 8293695: Implement isInfinite intrinsic for RISC-V Reviewed-by: yadongwang, fyang, dsamersoff ------------- PR: https://git.openjdk.org/jdk/pull/10253 From xgong at openjdk.org Thu Sep 15 09:54:57 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 15 Sep 2022 09:54:57 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v8] In-Reply-To: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> Message-ID: > Recently we found the performance of "`FIRST_NONZERO`" for double type is largely worse than the other types on x86 when `UseAVX=2`. The main reason is the "`VectorCastL2X`" op is not supported by the backend when the dst element type is `T_DOUBLE`. This makes the check of `VectorCast` op fail before intrinsifying "`VectorMask.cast()`" which is used in the > "`FIRST_NONZERO`" java implementation (see [1]). However, the compiler will not generate the `VectorCast `op for `VectorMask.cast()` if: > > 1) the current platform supports the predicated feature > 2) the element size (in bytes) of the src and dst type is the same > > So the check of "`VectorCast`" op is needless for such cases. To fix it, this patch: > > 1) limits the specified vector cast op check to vectors > 2) adds the relative mask cast op check for VectorMask.cast() > 3) cleans up the unnecessary codes > > Here is the performance of "`FIRST_NONZERO`" benchmark [2] on a x86 machine with `UseAVX=2`: > > Benchmark (size) Mode Cnt Before After Units > DoubleMaxVector.FIRST_NONZERO 1024 thrpt 15 49.266 2460.886 ops/ms > DoubleMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 49.554 1892.223 ops/ms > > [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java#L770 > [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L246 Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Address review comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9737/files - new: https://git.openjdk.org/jdk/pull/9737/files/5a752b82..c3aa861a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9737&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9737&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/9737.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9737/head:pull/9737 PR: https://git.openjdk.org/jdk/pull/9737 From xgong at openjdk.org Thu Sep 15 09:54:57 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 15 Sep 2022 09:54:57 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v7] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> <3GghrBjnyXnBCLCzcKUchQ2Zq9lZjS8Pw-IzBFGXftU=.6e5f6462-89fd-42fe-ba3e-4257642bdb05@github.com> <5T0aX9ytUUZgk06vu06bjFURBMtxr7C53ytVtHwZAnU=.f6f03f31-5986-49d3-be66-cda08f8868a6@github.com> Message-ID: <-owF2LRE0os1IQikg9eJZVghiNSQ_OASImE_EQ6aJWU=.1c059f4e-9de7-4107-ab6d-5627186da874@github.com> On Thu, 15 Sep 2022 08:38:56 GMT, Xiaohong Gong wrote: >> Yes, assert (!is_mask || (is_mask && num_elem_from == num_elem_to), "mask cast needs the same elem num "). > > Looks good. Thanks! The new commit reverted this changes. I will add the assertion in https://github.com/openjdk/jdk/pull/10192. ------------- PR: https://git.openjdk.org/jdk/pull/9737 From jiefu at openjdk.org Thu Sep 15 10:45:55 2022 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 15 Sep 2022 10:45:55 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v8] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> Message-ID: On Thu, 15 Sep 2022 09:54:57 GMT, Xiaohong Gong wrote: >> Recently we found the performance of "`FIRST_NONZERO`" for double type is largely worse than the other types on x86 when `UseAVX=2`. The main reason is the "`VectorCastL2X`" op is not supported by the backend when the dst element type is `T_DOUBLE`. This makes the check of `VectorCast` op fail before intrinsifying "`VectorMask.cast()`" which is used in the >> "`FIRST_NONZERO`" java implementation (see [1]). However, the compiler will not generate the `VectorCast `op for `VectorMask.cast()` if: >> >> 1) the current platform supports the predicated feature >> 2) the element size (in bytes) of the src and dst type is the same >> >> So the check of "`VectorCast`" op is needless for such cases. To fix it, this patch: >> >> 1) limits the specified vector cast op check to vectors >> 2) adds the relative mask cast op check for VectorMask.cast() >> 3) cleans up the unnecessary codes >> >> Here is the performance of "`FIRST_NONZERO`" benchmark [2] on a x86 machine with `UseAVX=2`: >> >> Benchmark (size) Mode Cnt Before After Units >> DoubleMaxVector.FIRST_NONZERO 1024 thrpt 15 49.266 2460.886 ops/ms >> DoubleMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 49.554 1892.223 ops/ms >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java#L770 >> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L246 > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Address review comment Still looks good to me. Thanks. ------------- Marked as reviewed by jiefu (Reviewer). PR: https://git.openjdk.org/jdk/pull/9737 From jbhateja at openjdk.org Thu Sep 15 11:22:52 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 15 Sep 2022 11:22:52 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v8] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> Message-ID: On Thu, 15 Sep 2022 09:54:57 GMT, Xiaohong Gong wrote: >> Recently we found the performance of "`FIRST_NONZERO`" for double type is largely worse than the other types on x86 when `UseAVX=2`. The main reason is the "`VectorCastL2X`" op is not supported by the backend when the dst element type is `T_DOUBLE`. This makes the check of `VectorCast` op fail before intrinsifying "`VectorMask.cast()`" which is used in the >> "`FIRST_NONZERO`" java implementation (see [1]). However, the compiler will not generate the `VectorCast `op for `VectorMask.cast()` if: >> >> 1) the current platform supports the predicated feature >> 2) the element size (in bytes) of the src and dst type is the same >> >> So the check of "`VectorCast`" op is needless for such cases. To fix it, this patch: >> >> 1) limits the specified vector cast op check to vectors >> 2) adds the relative mask cast op check for VectorMask.cast() >> 3) cleans up the unnecessary codes >> >> Here is the performance of "`FIRST_NONZERO`" benchmark [2] on a x86 machine with `UseAVX=2`: >> >> Benchmark (size) Mode Cnt Before After Units >> DoubleMaxVector.FIRST_NONZERO 1024 thrpt 15 49.266 2460.886 ops/ms >> DoubleMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 49.554 1892.223 ops/ms >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java#L770 >> [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L246 > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Address review comment Marked as reviewed by jbhateja (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/9737 From chagedorn at openjdk.org Thu Sep 15 11:29:20 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 15 Sep 2022 11:29:20 GMT Subject: RFR: 8293849: PrintIdealPhase in compiler directives file is ignored when used with other compile commands Message-ID: When using a compiler directives file with `PrintIdealPhase`: [ { match : "Test::*", log : true, PrintIdealPhase : "BEFORE_MATCHING" } ] together with other compile commands specified in `compilerdirectives_common_flags` and/or `compilerdirectives_c2_flags`: https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/compiler/compilerDirectives.hpp#L38-L39 https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/compiler/compilerDirectives.hpp#L63-L64 then the `PrintIdealPhase` option is ignored. The reason is that when cloning the `DirectiveSet` for the current compilation in `DirectiveSet::clone()`, we only set `PrintIdealPhaseOption` but forget to also set `_ideal_phase_name_mask` which is used when deciding if a compile phase should be dumped or not. As a result, the mask keeps its default value zero and nothing is dumped because `Compile::shoud_print_phase()` returns false: https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/opto/compile.cpp#L5060-L5067 The fix is to also clone the old value of `_ideal_phase_name_mask`. Thanks, Christian ------------- Commit messages: - 8293849: PrintIdealPhase in compiler directives file is ignored when used with other compile commands Changes: https://git.openjdk.org/jdk/pull/10283/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10283&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293849 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10283.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10283/head:pull/10283 PR: https://git.openjdk.org/jdk/pull/10283 From tholenstein at openjdk.org Thu Sep 15 15:09:01 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 15 Sep 2022 15:09:01 GMT Subject: RFR: JDK-8293480: IGV: Update Bytecode and ControlFlow Component immediately when opening a new graph [v5] In-Reply-To: References: Message-ID: <3sbkWKNWdsC5mHHdSm-AlvWG7K12aZ1mAKMyWHfxa8Y=.1a868a12-8189-4cb0-a70b-20e60dfb89a5@github.com> > The `BytecodeViewTopComponent` and `ControlFlowTopComponent` represent information depending on what graph is open in `EditorTopComponent`. Previously, `BytecodeViewTopComponent` and `ControlFlowTopComponent` did not update its content immediately when a new graph from a different group is opened in `EditorTopComponent`. They also did not update when switching between two tabs of open graph. > > We missed to `fire()` a `diagramChangedEvent` in the constructor of `EditorTopComponent`. We also need to fire when `BytecodeViewTopComponent` and `ControlFlowTopComponent` are initially opened. > Update Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - Revert "update TopComponents on closing" This reverts commit 49dbaa319cf72a0c16d4f9d3501c09d295744388. - Revert "Update Bytecode and ControlFlow when a group is removed" This reverts commit e908804316e3831aaaca37d0145dc8e5405c172d. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10196/files - new: https://git.openjdk.org/jdk/pull/10196/files/e9088043..2a643165 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10196&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10196&range=03-04 Stats: 126 lines in 9 files changed: 51 ins; 60 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/10196.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10196/head:pull/10196 PR: https://git.openjdk.org/jdk/pull/10196 From kvn at openjdk.org Thu Sep 15 16:38:42 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 15 Sep 2022 16:38:42 GMT Subject: RFR: 8293844: C2: Verify Location::{oop,normal} types in PhaseOutput::FillLocArray In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 08:50:58 GMT, Aleksey Shipilev wrote: > I have been debugging a weird issue in C2/deopt, and wanted to have stronger asserts in critical paths. One such place is `PhaseOutput::FillLocArray`, which emits `Location::normal` on unconditional `else` branch. `Location::normal` is described as "Ints, floats, double halves". I think we would be better off verifying the types explicitly. Same goes for `Location::oop`, which we can also verify. > > Aside: In fact, I suspect the whole `Regalloc::is_oop` business can go away, and we can rely on reg types to sense if we are dealing with oops here, but that looks like a change with some unexpected effects, so I would like to do that separately, see [JDK-8293845](https://bugs.openjdk.org/browse/JDK-8293845). > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_64 fastdebug `tier2` > - [x] Linux x86_32 fastdebug `tier1` > - [x] Linux x86_32 fastdebug `tier2` Looks reasonable. I will test it. ------------- PR: https://git.openjdk.org/jdk/pull/10281 From kvn at openjdk.org Thu Sep 15 16:47:48 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 15 Sep 2022 16:47:48 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v5] In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 09:17:07 GMT, Quan Anh Mai wrote: >> The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. >> >> Testing: Manually run the test case in the JBS and look at the compiled code. >> >> I also do some small clean-ups in x86_64.ad: >> >> - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. >> - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. >> - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed >> - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. >> - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. >> >> Please kindly review, thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > divL_10 Please also add testing (JMH) for shifts by 1 which you removed. And for `sub_*_imm` instructions. ------------- PR: https://git.openjdk.org/jdk/pull/9791 From duke at openjdk.org Thu Sep 15 16:53:44 2022 From: duke at openjdk.org (Zhiqiang Zang) Date: Thu, 15 Sep 2022 16:53:44 GMT Subject: RFR: 8281453: New optimization: convert `~x` into `-1-x` when `~x` is used in an arithmetic expression [v9] In-Reply-To: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> References: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> Message-ID: > Similar to `(~x)+c` -> `(c-1)-x` and `~(x+c)` -> `(-c-1)-x` in #6858, we can also introduce similar optimizations for subtraction, `c-(~x)` -> `x+(c+1)` and `~(c-x)` -> `x+(-c-1)`. > > The results of the microbenchmark are as follows: > > Baseline: > Benchmark Mode Cnt Score Error Units > SubIdealCMinusNotX.baselineInt avgt 60 0.504 ? 0.011 ns/op > SubIdealCMinusNotX.baselineLong avgt 60 0.484 ? 0.004 ns/op > SubIdealCMinusNotX.testInt1 avgt 60 0.779 ? 0.004 ns/op > SubIdealCMinusNotX.testInt2 avgt 60 0.896 ? 0.004 ns/op > SubIdealCMinusNotX.testLong1 avgt 60 0.722 ? 0.004 ns/op > SubIdealCMinusNotX.testLong2 avgt 60 0.720 ? 0.005 ns/op > > Patch: > Benchmark Mode Cnt Score Error Units > SubIdealCMinusNotX.baselineInt avgt 60 0.487 ? 0.009 ns/op > SubIdealCMinusNotX.baselineLong avgt 60 0.486 ? 0.009 ns/op > SubIdealCMinusNotX.testInt1 avgt 60 0.372 ? 0.010 ns/op > SubIdealCMinusNotX.testInt2 avgt 60 0.365 ? 0.003 ns/op > SubIdealCMinusNotX.testLong1 avgt 60 0.369 ? 0.004 ns/op > SubIdealCMinusNotX.testLong2 avgt 60 0.399 ? 0.016 ns/op Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: include microbenchmark. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/7376/files - new: https://git.openjdk.org/jdk/pull/7376/files/f7345fd9..1c238b8d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=7376&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=7376&range=07-08 Stats: 98 lines in 1 file changed: 98 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/7376.diff Fetch: git fetch https://git.openjdk.org/jdk pull/7376/head:pull/7376 PR: https://git.openjdk.org/jdk/pull/7376 From duke at openjdk.org Thu Sep 15 17:03:35 2022 From: duke at openjdk.org (Zhiqiang Zang) Date: Thu, 15 Sep 2022 17:03:35 GMT Subject: RFR: 8281453: New optimization: convert `~x` into `-1-x` when `~x` is used in an arithmetic expression [v7] In-Reply-To: References: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> Message-ID: On Wed, 13 Apr 2022 17:56:56 GMT, Vladimir Kozlov wrote: > Optimization you proposed does not match RFE description and title. > > You do only: `~x` or (x ^ (-1))`->`(-1 - x)` > > As result this should be Xor nodes ideal transformation. I don't even think you need such transformation if `rhs` and `lhs` are not constants because I assume `XOR` and `SUB` hw instructions have the same latency. > > I suggest you to redo performance testing after you merged #7795 changes. Hello @vnkozlov I updated the performance results and updated description and title as well. Can you please take a look when you get a chance thank you! ------------- PR: https://git.openjdk.org/jdk/pull/7376 From sviswanathan at openjdk.org Thu Sep 15 17:51:39 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 15 Sep 2022 17:51:39 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v5] In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 09:17:07 GMT, Quan Anh Mai wrote: >> The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. >> >> Testing: Manually run the test case in the JBS and look at the compiled code. >> >> I also do some small clean-ups in x86_64.ad: >> >> - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. >> - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. >> - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed >> - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. >> - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. >> >> Please kindly review, thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > divL_10 Looks good to me. Please wait for Vladimir's review and approval. ------------- Marked as reviewed by sviswanathan (Reviewer). PR: https://git.openjdk.org/jdk/pull/9791 From kvn at openjdk.org Thu Sep 15 18:46:40 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 15 Sep 2022 18:46:40 GMT Subject: RFR: 8281453: New optimization: convert `~x` into `-1-x` when `~x` is used in an arithmetic expression [v9] In-Reply-To: References: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> Message-ID: On Thu, 15 Sep 2022 16:53:44 GMT, Zhiqiang Zang wrote: >> Similar to `(~x)+c` -> `(c-1)-x` and `~(x+c)` -> `(-c-1)-x` in #6858, we can also introduce similar optimizations for subtraction, `c-(~x)` -> `x+(c+1)` and `~(c-x)` -> `x+(-c-1)`. >> >> To generalize, I convert `~x` into `-1-x` when `~x` is used in an arithmetic expression. For example, `c-(~x)` will be converted into `c-(-1-x)` which will match other pattern and will be transformed again in next iteration and finally become `x+(c+1)`. >> >> The results of the microbenchmark are as follows: >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> NotOpTransformation.baselineInt avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.baselineLong avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.testInt1 avgt 60 0.603 ? 0.001 ns/op >> NotOpTransformation.testInt2 avgt 60 0.603 ? 0.001 ns/op >> NotOpTransformation.testLong1 avgt 60 0.658 ? 0.001 ns/op >> NotOpTransformation.testLong2 avgt 60 0.658 ? 0.001 ns/op >> >> Patch: >> Benchmark Mode Cnt Score Error Units >> NotOpTransformation.baselineInt avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.baselineLong avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.testInt1 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testInt2 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testLong1 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testLong2 avgt 60 0.329 ? 0.001 ns/op > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > include microbenchmark. This looks much better. You can move PR from Draft state so we can do formal review and testing. Please, update Subject and Description in JDK-8281453 to match PR's. Thank you for adding all tests. ------------- PR: https://git.openjdk.org/jdk/pull/7376 From duke at openjdk.org Thu Sep 15 19:40:56 2022 From: duke at openjdk.org (Zhiqiang Zang) Date: Thu, 15 Sep 2022 19:40:56 GMT Subject: RFR: 8281453: New optimization: convert `~x` into `-1-x` when `~x` is used in an arithmetic expression [v9] In-Reply-To: References: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> Message-ID: On Thu, 15 Sep 2022 18:43:16 GMT, Vladimir Kozlov wrote: > Please, update Subject and Description in JDK-8281453 to match PR's. I have no account on jbs can you help update there thanks! ------------- PR: https://git.openjdk.org/jdk/pull/7376 From vlivanov at openjdk.org Thu Sep 15 20:35:26 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 15 Sep 2022 20:35:26 GMT Subject: RFR: 8293816: CI: ciBytecodeStream::get_klass() is not consistent Message-ID: CI responses should be consistent during a single compilation. [JDK-8293044](https://bugs.openjdk.org/browse/JDK-8293044) was fixed by turning inaccessible classes into unloaded ones when resolving them through CI. But there's another case when `ciEnv::get_klass_by_index()` returns a loaded ciKlass while setting `will_link` to `false`: a not-yet-resolved klass revealed through a class loader constraint. In such case, after a concurrent class loading CI will start reporting a loaded ciKlass instance. Such inconsistency may trigger some paradoxical situations during compilation. The fix is to instantiate a proper instance of an unloaded ciKlass, so further requests will return the unloaded instances as well. Testing: hs-tier1 - hs-tier4 ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/10294/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10294&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293816 Stats: 10 lines in 3 files changed: 1 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10294.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10294/head:pull/10294 PR: https://git.openjdk.org/jdk/pull/10294 From kvn at openjdk.org Thu Sep 15 22:04:48 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 15 Sep 2022 22:04:48 GMT Subject: RFR: 8281453: New optimization: convert `~x` into `-1-x` when `~x` is used in an arithmetic expression [v9] In-Reply-To: References: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> Message-ID: On Thu, 15 Sep 2022 19:38:12 GMT, Zhiqiang Zang wrote: > > Please, update Subject and Description in JDK-8281453 to match PR's. > > I have no account on jbs can you help update there thanks! Done. ------------- PR: https://git.openjdk.org/jdk/pull/7376 From kvn at openjdk.org Thu Sep 15 22:12:43 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 15 Sep 2022 22:12:43 GMT Subject: RFR: 8281453: New optimization: convert `~x` into `-1-x` when `~x` is used in an arithmetic expression [v9] In-Reply-To: References: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> Message-ID: On Thu, 15 Sep 2022 16:53:44 GMT, Zhiqiang Zang wrote: >> Similar to `(~x)+c` -> `(c-1)-x` and `~(x+c)` -> `(-c-1)-x` in #6858, we can also introduce similar optimizations for subtraction, `c-(~x)` -> `x+(c+1)` and `~(c-x)` -> `x+(-c-1)`. >> >> To generalize, I convert `~x` into `-1-x` when `~x` is used in an arithmetic expression. For example, `c-(~x)` will be converted into `c-(-1-x)` which will match other pattern and will be transformed again in next iteration and finally become `x+(c+1)`. >> >> The results of the microbenchmark are as follows: >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> NotOpTransformation.baselineInt avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.baselineLong avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.testInt1 avgt 60 0.603 ? 0.001 ns/op >> NotOpTransformation.testInt2 avgt 60 0.603 ? 0.001 ns/op >> NotOpTransformation.testLong1 avgt 60 0.658 ? 0.001 ns/op >> NotOpTransformation.testLong2 avgt 60 0.658 ? 0.001 ns/op >> >> Patch: >> Benchmark Mode Cnt Score Error Units >> NotOpTransformation.baselineInt avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.baselineLong avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.testInt1 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testInt2 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testLong1 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testLong2 avgt 60 0.329 ? 0.001 ns/op > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > include microbenchmark. I submitted testing. ------------- PR: https://git.openjdk.org/jdk/pull/7376 From dlong at openjdk.org Thu Sep 15 22:12:46 2022 From: dlong at openjdk.org (Dean Long) Date: Thu, 15 Sep 2022 22:12:46 GMT Subject: RFR: 8293816: CI: ciBytecodeStream::get_klass() is not consistent In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 19:58:13 GMT, Vladimir Ivanov wrote: > CI responses should be consistent during a single compilation. > > [JDK-8293044](https://bugs.openjdk.org/browse/JDK-8293044) was fixed by turning inaccessible classes into unloaded ones when resolving them through CI. > > But there's another case when `ciEnv::get_klass_by_index()` returns a loaded ciKlass while setting `will_link` to `false`: a not-yet-resolved klass revealed through a class loader constraint. > > In such case, after a concurrent class loading CI will start reporting a loaded ciKlass instance. Such inconsistency may trigger some paradoxical situations during compilation. > > The fix is to instantiate a proper instance of an unloaded ciKlass, so further requests will return the unloaded instances as well. > > Testing: hs-tier1 - hs-tier4 Makes sense. ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.org/jdk/pull/10294 From kvn at openjdk.org Thu Sep 15 22:15:30 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 15 Sep 2022 22:15:30 GMT Subject: RFR: 8293844: C2: Verify Location::{oop,normal} types in PhaseOutput::FillLocArray In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 08:50:58 GMT, Aleksey Shipilev wrote: > I have been debugging a weird issue in C2/deopt, and wanted to have stronger asserts in critical paths. One such place is `PhaseOutput::FillLocArray`, which emits `Location::normal` on unconditional `else` branch. `Location::normal` is described as "Ints, floats, double halves". I think we would be better off verifying the types explicitly. Same goes for `Location::oop`, which we can also verify. > > Aside: In fact, I suspect the whole `Regalloc::is_oop` business can go away, and we can rely on reg types to sense if we are dealing with oops here, but that looks like a change with some unexpected effects, so I would like to do that separately, see [JDK-8293845](https://bugs.openjdk.org/browse/JDK-8293845). > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_64 fastdebug `tier2` > - [x] Linux x86_32 fastdebug `tier1` > - [x] Linux x86_32 fastdebug `tier2` My tier1-4 and xcomp testing passed too. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10281 From kvn at openjdk.org Thu Sep 15 22:18:40 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 15 Sep 2022 22:18:40 GMT Subject: RFR: 8293816: CI: ciBytecodeStream::get_klass() is not consistent In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 19:58:13 GMT, Vladimir Ivanov wrote: > CI responses should be consistent during a single compilation. > > [JDK-8293044](https://bugs.openjdk.org/browse/JDK-8293044) was fixed by turning inaccessible classes into unloaded ones when resolving them through CI. > > But there's another case when `ciEnv::get_klass_by_index()` returns a loaded ciKlass while setting `will_link` to `false`: a not-yet-resolved klass revealed through a class loader constraint. > > In such case, after a concurrent class loading CI will start reporting a loaded ciKlass instance. Such inconsistency may trigger some paradoxical situations during compilation. > > The fix is to instantiate a proper instance of an unloaded ciKlass, so further requests will return the unloaded instances as well. > > Testing: hs-tier1 - hs-tier4 Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10294 From xgong at openjdk.org Fri Sep 16 01:27:53 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 16 Sep 2022 01:27:53 GMT Subject: RFR: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast [v8] In-Reply-To: References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> Message-ID: On Thu, 15 Sep 2022 11:19:10 GMT, Jatin Bhateja wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comment > > Marked as reviewed by jbhateja (Reviewer). Thanks for the review @jatin-bhateja @DamonFool ! ------------- PR: https://git.openjdk.org/jdk/pull/9737 From xgong at openjdk.org Fri Sep 16 01:29:14 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 16 Sep 2022 01:29:14 GMT Subject: Integrated: 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast In-Reply-To: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> References: <1mDQWN8f2Gpb-tuVZo_Jj6TVU1yNtU-4jY00D4gfW5s=.97d48dbc-0ffd-432d-83fe-3544295036e2@github.com> Message-ID: On Thu, 4 Aug 2022 06:08:44 GMT, Xiaohong Gong wrote: > Recently we found the performance of "`FIRST_NONZERO`" for double type is largely worse than the other types on x86 when `UseAVX=2`. The main reason is the "`VectorCastL2X`" op is not supported by the backend when the dst element type is `T_DOUBLE`. This makes the check of `VectorCast` op fail before intrinsifying "`VectorMask.cast()`" which is used in the > "`FIRST_NONZERO`" java implementation (see [1]). However, the compiler will not generate the `VectorCast `op for `VectorMask.cast()` if: > > 1) the current platform supports the predicated feature > 2) the element size (in bytes) of the src and dst type is the same > > So the check of "`VectorCast`" op is needless for such cases. To fix it, this patch: > > 1) limits the specified vector cast op check to vectors > 2) adds the relative mask cast op check for VectorMask.cast() > 3) cleans up the unnecessary codes > > Here is the performance of "`FIRST_NONZERO`" benchmark [2] on a x86 machine with `UseAVX=2`: > > Benchmark (size) Mode Cnt Before After Units > DoubleMaxVector.FIRST_NONZERO 1024 thrpt 15 49.266 2460.886 ops/ms > DoubleMaxVector.FIRST_NONZEROMasked 1024 thrpt 15 49.554 1892.223 ops/ms > > [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/DoubleVector.java#L770 > [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L246 This pull request has now been integrated. Changeset: 3beca2db Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/3beca2db0761f8172614bf1b287b694c8595b498 Stats: 18 lines in 1 file changed: 7 ins; 3 del; 8 mod 8291600: [vectorapi] vector cast op check is not always needed for vector mask cast Co-authored-by: Quan Anh Mai Reviewed-by: jiefu, eliu, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/9737 From dlong at openjdk.org Fri Sep 16 01:50:46 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 16 Sep 2022 01:50:46 GMT Subject: RFR: 8293844: C2: Verify Location::{oop,normal} types in PhaseOutput::FillLocArray In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 08:50:58 GMT, Aleksey Shipilev wrote: > I have been debugging a weird issue in C2/deopt, and wanted to have stronger asserts in critical paths. One such place is `PhaseOutput::FillLocArray`, which emits `Location::normal` on unconditional `else` branch. `Location::normal` is described as "Ints, floats, double halves". I think we would be better off verifying the types explicitly. Same goes for `Location::oop`, which we can also verify. > > Aside: In fact, I suspect the whole `Regalloc::is_oop` business can go away, and we can rely on reg types to sense if we are dealing with oops here, but that looks like a change with some unexpected effects, so I would like to do that separately, see [JDK-8293845](https://bugs.openjdk.org/browse/JDK-8293845). > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_64 fastdebug `tier2` > - [x] Linux x86_32 fastdebug `tier1` > - [x] Linux x86_32 fastdebug `tier2` Seems fine. I did not check that the types checked by the new asserts are exhaustive. ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.org/jdk/pull/10281 From xgong at openjdk.org Fri Sep 16 03:50:14 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 16 Sep 2022 03:50:14 GMT Subject: RFR: 8292898: [vectorapi] Unify vector mask cast operation [v3] In-Reply-To: References: Message-ID: > The current implementation of the vector mask cast operation is > complex that the compiler generates different patterns for different > scenarios. For architectures that do not support the predicate > feature, vector mask is represented the same as the normal vector. > So the vector mask cast is implemented by `VectorCast `node. But this > is not always needed. When two masks have the same element size (e.g. > int vs. float), their bits layout are the same. So casting between > them does not need to emit any instructions. > > Currently the compiler generates different patterns based on the > vector type of the input/output and the platforms. Normally the > "`VectorMaskCast`" op is only used for cases that doesn't emit any > instructions, and "`VectorCast`" op is used to implement the necessary > expand/narrow operations. This can avoid adding some duplicate rules > in the backend. However, this also has the drawbacks: > > 1) The codes are complex, especially when the compiler needs to > check whether the hardware supports the necessary IRs for the > vector mask cast. It needs to check different patterns for > different cases. > 2) The vector mask cast operation could be implemented with cheaper > instructions than the vector casting on some architectures. > > Instead of generating `VectorCast `or `VectorMaskCast `nodes for different > cases of vector mask cast operations, this patch unifies the vector > mask cast implementation with "`VectorMaskCast`" node for all vector types > and platforms. The missing backend rules are also added for it. > > This patch also simplies the vector mask conversion happened in > "`VectorUnbox::Ideal()`". Normally "`VectorUnbox (VectorBox vmask)`" can > be optimized to "`vmask`" if the unboxing type matches with the boxed > "`vmask`" type. Otherwise, it needs the type conversion. Currently the > "`VectorUnbox`" will be transformed to two different patterns to implement > the conversion: > > 1) If the element size is not changed, it is transformed to: > > "VectorMaskCast vmask" > > 2) Otherwise, it is transformed to: > > "VectorLoadMask (VectorStoreMask vmask)" > > It firstly converts the "`vmask`" to a boolean vector with "`VectorStoreMask`", > and then uses "`VectorLoadMask`" to convert the boolean vector to the > dst mask vector. Since this patch makes "`VectorMaskCast`" op supported > for all types on all platforms, it doesn't need the "`VectorLoadMask`" and > "`VectorStoreMask`" to do the conversion. The existing transformation: > > VectorUnbox (VectorBox vmask) => VectorLoadMask (VectorStoreMask vmask) > > can be simplified to: > > VectorUnbox (VectorBox vmask) => VectorMaskCast vmask Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge branch 'jdk:master' into JDK-8292898 - 8292898: [vectorapi] Unify vector mask cast operation - Merge branch 'jdk:master' into JDK-8291600 - Address review comments - Add vector cast op check for vector mask for some cases - Revert the unify changes to vector mask cast - Merge branch 'jdk:master' into JDK-8291600 - Fix x86 codegen issue - Unify VectorMaskCast for all platforms - Merge branch 'master' into JDK-8291600 - ... and 1 more: https://git.openjdk.org/jdk/compare/3beca2db...15bfa98e ------------- Changes: https://git.openjdk.org/jdk/pull/10192/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10192&range=02 Stats: 360 lines in 8 files changed: 278 ins; 62 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/10192.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10192/head:pull/10192 PR: https://git.openjdk.org/jdk/pull/10192 From duke at openjdk.org Fri Sep 16 05:33:59 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Fri, 16 Sep 2022 05:33:59 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v6] In-Reply-To: References: Message-ID: > The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. > > Testing: Manually run the test case in the JBS and look at the compiled code. > > I also do some small clean-ups in x86_64.ad: > > - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. > - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. > - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed > - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. > - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. > > Please kindly review, thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add jmhs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9791/files - new: https://git.openjdk.org/jdk/pull/9791/files/ef4f3cf9..97d39b7e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9791&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9791&range=04-05 Stats: 58 lines in 1 file changed: 58 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/9791.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9791/head:pull/9791 PR: https://git.openjdk.org/jdk/pull/9791 From duke at openjdk.org Fri Sep 16 05:34:00 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Fri, 16 Sep 2022 05:34:00 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v5] In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 16:45:26 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> divL_10 > > Please also add testing (JMH) for shifts by 1 which you removed. And for `sub_*_imm` instructions. @vnkozlov I have added cases for shifts by 1 and immediate subtractions as you suggested. Before After Benchmark Mode Cnt Score Error Score Error Units BasicRules.add_mem_con avgt 15 202.913 ? 4.915 210.006 ? 9.775 ns/op BasicRules.andL_rReg_imm255 avgt 15 185.073 ? 2.768 187.580 ? 4.930 ns/op BasicRules.andL_rReg_imm65535 avgt 15 184.056 ? 2.315 186.884 ? 2.198 ns/op BasicRules.divL_10 avgt 15 644.609 ? 8.740 654.411 ? 11.558 ns/op BasicRules.salI_rReg_1 avgt 15 183.385 ? 2.720 181.688 ? 1.435 ns/op BasicRules.salL_rReg_1 avgt 15 183.192 ? 2.597 184.337 ? 7.378 ns/op BasicRules.sarI_rReg_1 avgt 15 184.174 ? 2.646 184.984 ? 6.895 ns/op BasicRules.sarL_rReg_1 avgt 15 182.288 ? 2.012 182.062 ? 2.178 ns/op BasicRules.shrI_rReg_1 avgt 15 182.465 ? 2.242 180.915 ? 1.162 ns/op BasicRules.shrL_rReg_1 avgt 15 183.147 ? 2.277 181.170 ? 1.567 ns/op BasicRules.subI_rReg_imm avgt 15 200.593 ? 2.279 199.828 ? 2.255 ns/op BasicRules.subL_rReg_imm avgt 15 185.141 ? 5.717 183.781 ? 3.173 ns/op ------------- PR: https://git.openjdk.org/jdk/pull/9791 From chagedorn at openjdk.org Fri Sep 16 07:55:55 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 Sep 2022 07:55:55 GMT Subject: RFR: 8292761: x86: Clone nodes to match complex rules [v3] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 11:18:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch tries to clone a node if it can be matched as a part of a BMI and lea pattern. This may reduce the live range of a local or remove that local completely. >> >> Please take a look and have some reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - add benchmark > - Merge branch 'master' into cloneSimpleNodes > - Merge branch 'master' into cloneSimpleNodes > - fix > - Merge branch 'master' into cloneSimpleNodes > - shorten > - improve checks > - lea patterns > - refactor > - lea patterns > - ... and 1 more: https://git.openjdk.org/jdk/compare/c48414e3...0beae979 src/hotspot/cpu/x86/x86.ad line 2445: > 2443: } > 2444: } > 2445: if (n->Opcode() == Op_AddL) { Could be an `else if` ------------- PR: https://git.openjdk.org/jdk/pull/9977 From roland at openjdk.org Fri Sep 16 08:42:51 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 16 Sep 2022 08:42:51 GMT Subject: RFR: 8287217: C2: PhaseCCP: remove not visited nodes, prevent type inconsistency In-Reply-To: References: Message-ID: On Tue, 13 Sep 2022 14:33:14 GMT, Emanuel Peter wrote: > **Context:** > [JDK-8265973](https://bugs.openjdk.org/browse/JDK-8265973) Fix in Valhalla repository (Tobias @TobiHartmann ). > [JDK-8290711](https://bugs.openjdk.org/browse/JDK-8290711) Fix in mainline (Roland @rwestrel ). > Tobias' fix is a superset of Rolands. Unfortunately, Tobias gave up his fix once Rolands came up, because they did not have tests that required the superset fix. > > We now have such a test, where Rolands fix is not sufficient. But Tobias' fix is the solution. So I ported Tobias' fix to mainline. > > **Analysis:** > In this bug, we have two `LoadB` nodes re-pushing themselves to the `igvn.worklist`, without end. This leads to an assert after too many iterations. > > `PhaseCCP::analyze` is looking at a post-loop. The loop has a memory access, so there is a `null_check`. The data-part of the loop is connected down to Root via this `null_check` > (`Phi-> CmpP -> Bool -> If -> IfFalse -> Region -> CallStaticJava uncommon_trap -> ... -> Root`). > During `CCP::analyze`, we discover that the memory address is NonNull. So we update the `phase->type(n)` for many of the data-nodes of the loop. > > During `PhaseCCP::do_transform`, we now traverse recursively up from the root, visiting all reachable nodes. > When we visit a node, we store the cached `phase->type(n)` into the node, making the node's type consistent. > We traverse up through the `null_check`, through the `uncommon_trap`, and the `If`, to the `Bool` node. > `BoolNode::Value` realizes that we can never have Null, and is subsumed by constant `#int:1` (true). > This means that the data-part of the loop just lost its connection down to Root. > The traversal now also does not reach further than the Bool node which was just subsumed, and hence does not reach the data-part of the loop. > This means we have nodes with inconsistent type. > > Summary: CCP disconnects the last path down to root for a data-loop, because it realizes that a `null_check` will never trap. The disconnected state means the types of the data-loop may be left inconsistent. > > Right after PhaseCCP, we continue with IGVN. > The `LoadB` from that data-part of the loop has `MemNode::Ideal_common` called, which defers its transformation until the type of the address is consistent. However, this is never made consistent, as it is already left inconsistent after PhaseCCP. > https://github.com/openjdk/jdk/blob/aa7ccdf44549a52cce9e99f6569097d3343d9ee4/src/hotspot/share/opto/memnode.cpp#L351-L358 > Note that we only re-push if there is another node in the worklist - a node that hopefully has something to do with the address. > But in our case it is just the two LoadB nodes, which were generated from the same `split_through_phi`. > > **Solution:** > At the end of PhaseCCP, we remove all nodes that were not visited (and may have an inconsistent state). We can do this because we visited all nodes that are still relevant. Rolands fix already made sure that SafePointNodes are visited, such that infinite loops are covered as well. > > Regression test added. > Test suite passed. Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.org/jdk/pull/10250 From chagedorn at openjdk.org Fri Sep 16 09:15:46 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 Sep 2022 09:15:46 GMT Subject: RFR: 8287217: C2: PhaseCCP: remove not visited nodes, prevent type inconsistency In-Reply-To: References: Message-ID: On Tue, 13 Sep 2022 14:33:14 GMT, Emanuel Peter wrote: > **Context:** > [JDK-8265973](https://bugs.openjdk.org/browse/JDK-8265973) Fix in Valhalla repository (Tobias @TobiHartmann ). > [JDK-8290711](https://bugs.openjdk.org/browse/JDK-8290711) Fix in mainline (Roland @rwestrel ). > Tobias' fix is a superset of Rolands. Unfortunately, Tobias gave up his fix once Rolands came up, because they did not have tests that required the superset fix. > > We now have such a test, where Rolands fix is not sufficient. But Tobias' fix is the solution. So I ported Tobias' fix to mainline. > > **Analysis:** > In this bug, we have two `LoadB` nodes re-pushing themselves to the `igvn.worklist`, without end. This leads to an assert after too many iterations. > > `PhaseCCP::analyze` is looking at a post-loop. The loop has a memory access, so there is a `null_check`. The data-part of the loop is connected down to Root via this `null_check` > (`Phi-> CmpP -> Bool -> If -> IfFalse -> Region -> CallStaticJava uncommon_trap -> ... -> Root`). > During `CCP::analyze`, we discover that the memory address is NonNull. So we update the `phase->type(n)` for many of the data-nodes of the loop. > > During `PhaseCCP::do_transform`, we now traverse recursively up from the root, visiting all reachable nodes. > When we visit a node, we store the cached `phase->type(n)` into the node, making the node's type consistent. > We traverse up through the `null_check`, through the `uncommon_trap`, and the `If`, to the `Bool` node. > `BoolNode::Value` realizes that we can never have Null, and is subsumed by constant `#int:1` (true). > This means that the data-part of the loop just lost its connection down to Root. > The traversal now also does not reach further than the Bool node which was just subsumed, and hence does not reach the data-part of the loop. > This means we have nodes with inconsistent type. > > Summary: CCP disconnects the last path down to root for a data-loop, because it realizes that a `null_check` will never trap. The disconnected state means the types of the data-loop may be left inconsistent. > > Right after PhaseCCP, we continue with IGVN. > The `LoadB` from that data-part of the loop has `MemNode::Ideal_common` called, which defers its transformation until the type of the address is consistent. However, this is never made consistent, as it is already left inconsistent after PhaseCCP. > https://github.com/openjdk/jdk/blob/aa7ccdf44549a52cce9e99f6569097d3343d9ee4/src/hotspot/share/opto/memnode.cpp#L351-L358 > Note that we only re-push if there is another node in the worklist - a node that hopefully has something to do with the address. > But in our case it is just the two LoadB nodes, which were generated from the same `split_through_phi`. > > **Solution:** > At the end of PhaseCCP, we remove all nodes that were not visited (and may have an inconsistent state). We can do this because we visited all nodes that are still relevant. Rolands fix already made sure that SafePointNodes are visited, such that infinite loops are covered as well. > > Regression test added. > Test suite passed. Nice analysis, looks good! src/hotspot/share/opto/phaseX.cpp line 1986: > 1984: > 1985: while ( transform_stack.is_nonempty() ) { > 1986: Node *clone = transform_stack.pop(); Even though it's old code, you could also fix the code style when touching the code: Suggestion: while (transform_stack.is_nonempty()) { Node* clone = transform_stack.pop(); test/hotspot/jtreg/compiler/ccp/TestRemoveUnreachableCCP.java line 3: > 1: /* > 2: * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved. > 3: * Looking at other files, I think this empty line should be removed. test/hotspot/jtreg/compiler/ccp/TestRemoveUnreachableCCP.java line 29: > 27: * @bug 8287217 > 28: * @summary CCP must remove nodes that are not traversed, else their type can be inconsistent > 29: * @run main/othervm -Xcomp -Xbatch -XX:CompileCommand=compileOnly,TestRemoveUnreachableCCP::test `-Xbatch` can be removed as it is implied by `-Xcomp`. I think we should use `compileonly` instead of `compileOnly`. But it is case insensitive, so it does not really matter. test/hotspot/jtreg/compiler/ccp/TestRemoveUnreachableCCP.java line 45: > 43: > 44: public static void main(String[] strArr) { > 45: TestRemoveUnreachableCCP _instance = new TestRemoveUnreachableCCP(); The instance is unused and can be removed. ------------- Marked as reviewed by chagedorn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10250 From shade at openjdk.org Fri Sep 16 11:17:35 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 16 Sep 2022 11:17:35 GMT Subject: RFR: 8293937: x86: Drop LP64 conditions from clearly x86_32 code Message-ID: <8UHAbltEwlCwzkbxK6TIlK41HT9rlKqsICS-rMbVOKY=.db04836b-8559-4203-ae29-488ec4d2234b@github.com> Noticed this when porting Loom on x86_32. There are `*_x86_32.cpp` files that use `_LP64` as if it matters for them. It does not make sense, as in those files we always have `!_LP64`. We can drop the conditionals and clean the code. Proof of completeness: $ ack LP64 src/hotspot/ | grep _32 src/hotspot/cpu/x86/register_x86.hpp:386: NOT_LP64( 8 + ) // FILL0-FILL7 in x86_32.ad src/hotspot/cpu/x86/vm_version_x86.hpp:733: return LP64_ONLY(true) NOT_LP64(false); // not implemented on x86_32 ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/10305/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10305&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293937 Stats: 46 lines in 2 files changed: 0 ins; 37 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/10305.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10305/head:pull/10305 PR: https://git.openjdk.org/jdk/pull/10305 From chagedorn at openjdk.org Fri Sep 16 12:07:40 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 Sep 2022 12:07:40 GMT Subject: RFR: 8292088: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue Message-ID: <4KiGggNq8m0gnRkAxarMncZ1gVWSJ-QNS3IFXzmIdQE=.5e847c08-973d-4517-8778-c9d6d33c7b50@github.com> In `testKnownLimit()`, we directly use the (pre-incremented) iv phi `IV_PHI_i` (`232 Phi`) in the loop exit check of the `while` loop: ![Screenshot from 2022-09-16 10-33-58](https://user-images.githubusercontent.com/17833009/190604454-50aa1b1e-7111-4723-a329-b95e0f26c220.png) Such pre-incremented iv phi uses after the loop are detected in `PhaseIdealLoop::reorg_offsets()` and replaced in order to reduce register pressure. We insert an additional `Opaque2` node to prevent any optimizations to undo the effect of `PhaseIdealLoop::reorg_offsets()`: // iv Phi iv Phi // | | // | AddI (+stride) // | | // | Opaque2 # Blocks IGVN from folding these nodes until loop opts are over. // | ====> | // | AddI (-stride) // | | // | CastII # Preserve type of iv Phi // | | // Outside Use Outside Use In the test case, this is done before CCP and looks like this: ![Screenshot from 2022-09-16 10-33-35](https://user-images.githubusercontent.com/17833009/190623922-3b0c9eeb-8468-4cd7-8fe1-1f7df3dc5071.png) At that point, we do not know yet that the `while` loop is only gonna be executed once (i.e. `422 CountedLoopEnd` is always false). This only becomes known after CCP where the type of `232 Phi` improves. But since we have an `Opaque2` node, this update is not propagated until the `Opaque2` nodes are removed in macro expansion: https://github.com/openjdk/jdk/blob/11e7d53b23796cbd3d878048f7553885ae07f4d1/src/hotspot/share/opto/macro.cpp#L2412-L2414 During macro expansion, we also adjust the strip mined loop: We move the `421 Bool` of the inner loop exit check `422 CountedLoopEnd` to the outer strip mined loop exit check and adjust the inner loop exit check in such a way that C2 cannot figure out that the entire loop is only be run once. In the next IGVN phase, the outer strip mined loop node is removed while the inner loop `429 CountedLoop` is not. Later in `verify_strip_mined()`, we cannot find the outer strip mined loop of `429 CountedLoop` anymore and we fail with the assertion. The first thought to fix this problem is to add `Opaque2::Value()` to let type information flow. But this does not fix the problem completely if the type of the iv phi has no known upper limit. There we have the problem that in general `type(phi) != type(phi + x - x)` because `phi + x` could overflow and we end up with type `int` (which happens in `testUnknownLimit()`). I therefore suggest to remove `Opaque2` nodes earlier before macro expansion to fix this bug. A good place seems to be right after loop opts are over. We can remove them at the same time as `Opaque1` nodes by adding a similar `Identity()` method. This lets the loop nodes to be folded away before trying to adjust the outer strip mined loop limit. #### Are Opaque2 nodes really useful? When working on this bug, I started to question the usage of `Opaque2` nodes in general. We are still running IGVN after `Opaque2` nodes are currently removed. This simply undoes the effects of `PhaseIdealLoop::reorg_offsets()` again and end up using pre-incremented iv phis anyways. My theory was that we are either blocking some specific optimizations during loop opts which cannot be reverted later in IGVN or that we initially (when this `Opaque2` optimization was added) did not run IGVN anymore once `Opaque2` nodes are removed. I could not think of any such non-revertable optimization that `Opaque2` nodes could prevent. On top of that, `PhaseIdealLoop::reorg_offsets()` also does not mention anything alike. I therefore had a look at the history of `Opaque2` nodes. Unfortunately, they were added before the initial load commit. I've dug deeper through some old closed repo and found that at the time the `Opaque2` nodes were introduced around 20 years ago, we did not do any IGVN anymore after the removal of the `Opaque2` nodes - and we generated code with these unoptimized `iv phi + x - x` patterns. This suggests that today the `Opaque2` nodes are indeed not really doing what they were originally supposed to do. I would therefore suggest to investigate their complete removal in a separate RFE and go with the suggested fix above which does not make the current situation of the questionable `Opaque2` node usages any worse. Thanks, Christian ------------- Commit messages: - 8292088: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue Changes: https://git.openjdk.org/jdk/pull/10306/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10306&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8292088 Stats: 221 lines in 4 files changed: 218 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10306.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10306/head:pull/10306 PR: https://git.openjdk.org/jdk/pull/10306 From rcastanedalo at openjdk.org Fri Sep 16 12:17:57 2022 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 Sep 2022 12:17:57 GMT Subject: RFR: 8293849: PrintIdealPhase in compiler directives file is ignored when used with other compile commands In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 11:20:06 GMT, Christian Hagedorn wrote: > When using a compiler directives file with `PrintIdealPhase`: > > > [ > { > match : "Test::*", > log : true, > PrintIdealPhase : "BEFORE_MATCHING" > } > ] > > > together with other compile commands specified in `compilerdirectives_common_flags` and/or `compilerdirectives_c2_flags`: > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/compiler/compilerDirectives.hpp#L38-L39 > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/compiler/compilerDirectives.hpp#L63-L64 > > then the `PrintIdealPhase` option is ignored. > > The reason is that when cloning the `DirectiveSet` for the current compilation in `DirectiveSet::clone()`, we only set `PrintIdealPhaseOption` but forget to also set `_ideal_phase_name_mask` which is used when deciding if a compile phase should be dumped or not. As a result, the mask keeps its default value zero and nothing is dumped because `Compile::shoud_print_phase()` returns false: > > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/opto/compile.cpp#L5060-L5067 > > > The fix is to also clone the old value of `_ideal_phase_name_mask`. > > Thanks, > Christian Looks good to me! Besides reproducing the reported issue, I also tested that - compiler directives take precedence over compile commands (as specified in the [Compiler Control JEP](https://openjdk.org/jeps/165), paragraph "CompileCommand and backwards compatibility") if both provide a PrintIdealPhase; and - the fix makes it possible to run compile commands on top of IR tests in [your prototype changeset](https://github.com/chhagedorn/jdk/tree/JDK-8280378) for [JDK-8280378](https://bugs.openjdk.org/browse/JDK-8280378). ------------- Marked as reviewed by rcastanedalo (Reviewer). PR: https://git.openjdk.org/jdk/pull/10283 From chagedorn at openjdk.org Fri Sep 16 12:27:50 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 Sep 2022 12:27:50 GMT Subject: RFR: 8293849: PrintIdealPhase in compiler directives file is ignored when used with other compile commands In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 11:20:06 GMT, Christian Hagedorn wrote: > When using a compiler directives file with `PrintIdealPhase`: > > > [ > { > match : "Test::*", > log : true, > PrintIdealPhase : "BEFORE_MATCHING" > } > ] > > > together with other compile commands specified in `compilerdirectives_common_flags` and/or `compilerdirectives_c2_flags`: > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/compiler/compilerDirectives.hpp#L38-L39 > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/compiler/compilerDirectives.hpp#L63-L64 > > then the `PrintIdealPhase` option is ignored. > > The reason is that when cloning the `DirectiveSet` for the current compilation in `DirectiveSet::clone()`, we only set `PrintIdealPhaseOption` but forget to also set `_ideal_phase_name_mask` which is used when deciding if a compile phase should be dumped or not. As a result, the mask keeps its default value zero and nothing is dumped because `Compile::shoud_print_phase()` returns false: > > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/opto/compile.cpp#L5060-L5067 > > > The fix is to also clone the old value of `_ideal_phase_name_mask`. > > Thanks, > Christian Thanks a lot Roberto for your review, the additional testing, and originally finding this bug! ------------- PR: https://git.openjdk.org/jdk/pull/10283 From dnsimon at openjdk.org Fri Sep 16 12:45:11 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 16 Sep 2022 12:45:11 GMT Subject: RFR: 8293942: [JVMCI] data section entries must be 4-byte aligned on AArch64 Message-ID: As a result of [JDK-8283626](https://bugs.openjdk.org/browse/JDK-8283626), each entry in a data section in a CodeBuffer on AArch64 needs to be 4-byte aligned. This PR exposes this alignment requirement via JVMCI so that Graal can adhere to it. ------------- Commit messages: - expose min alignment for data section items via JVMCI Changes: https://git.openjdk.org/jdk/pull/10308/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10308&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293942 Stats: 16 lines in 4 files changed: 16 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10308.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10308/head:pull/10308 PR: https://git.openjdk.org/jdk/pull/10308 From rcastanedalo at openjdk.org Fri Sep 16 13:06:51 2022 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 Sep 2022 13:06:51 GMT Subject: RFR: JDK-8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs [v5] In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 09:06:25 GMT, Tobias Holenstein wrote: >> Refactor the Actions in EditorTopComponent (com.sun.hotspot.igv.view.actions). Move Action specific code from EditorTopComponent to the corresponding Action. >> >> # Refactoring of com.sun.hotspot.igv.view.actions and EditorTopComponent >> - Created a new `ExportGraph` Action and moved corresponding functions `exportToSVG(..)` and `exportToPDF(..)` to new `ExportGraph.java` >> - Moved key bindings for satellite-view (pressing S) from `EditorTopComponent` to `OverviewAction.java` >> - Moved Action specific code from `EditorTopComponent` to the corresponding `XXXAction.java` >> - Changed `PrevDiagramAction`, `ExpandDiffAction`, `ExtractAction`, `HideAction`, `NextDiagramAction`, `ReduceDiffAction` and `ShowAllAction` to be context aware `ContextAction` actions and use more modern `@ActionRegistration` to move away from manually defining actions in `layer.xml` >> - new `addContextListener` / `removeContextListener` function in `ContextAction` enables context aware actions to define to which `ChangedEvent` they want to react to >> >> # Fixing minor Bugs >> - "Show empty blocks in control-flow graph view" is selected by default but only enabled in CFG view. >> This is distracting for the eye when we are not in CFG: >> cfg_before >> Now "Show empty blocks in control-flow graph view" is not selected anymore when disabled (greyed out) >> cfg_node_disable >> But still gets selected by default when enabled >> cfg_now >> >> - "Extract current set of selected nodes", "Hide selected nodes" and "show all nodes" were always enabled, even when they didn't effect anything. >> selection_before >> Now "Extract current set of selected nodes", "Hide selected nodes" are disabled (greyed out) when no nodes are selected. And "show all nodes" is disabled (greyed out) when all nodes are already visible. >> selection_now >> >> - "Reduce the difference selection" got stuck when at the last graphs in the group because it got greyed out. >> reduce_stuck >> duce the difference selection" >> Now "Reduce the difference selection" works as expected: >> reduce_now > > Tobias Holenstein has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: > > - Update Copyright year > - Update OutlineTopComponent.java > > Revert "Update OutlineTopComponent.java" > > This reverts commit 65e0651730983e12c032bb89564c3ef93aa34dbe. > > revert whitespace change I have tested the changeset again and did not find any issues, keyboard shortcuts and contextual enabling/disabling of toolbar buttons works as expected. I only have a few style comments, please consider addressing them before integration. src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/actions/ImportAction.java line 52: > 50: import org.openide.util.RequestProcessor; > 51: import org.openide.util.Utilities; > 52: import org.openide.util.actions.CallableSystemAction; As discussed earlier, it would really be preferable to leave cleanups of this kind to a separate RFE, especially if the file is not changed otherwise. It would make reviewing easier, especially on large changesets like this one. Why not postpone all import reordering to https://github.com/openjdk/jdk/pull/10197? Similarly for e.g. the changes in `DiagramScene.java`. src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java line 431: > 429: setLayout(new java.awt.BorderLayout()); > 430: > 431: }// //GEN-END:initComponents Did you check whether, after moving this generated code, it can still be updated using NetBeans' Form Editor? src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExportAction.java line 42: > 40: * > 41: * @author Thomas Wuerthinger > 42: */ Please keep the original `@autor` tag, here and in all other cases where it is removed. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR: https://git.openjdk.org/jdk/pull/10170 From roland at openjdk.org Fri Sep 16 13:42:00 2022 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 16 Sep 2022 13:42:00 GMT Subject: RFR: 8292088: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue In-Reply-To: <4KiGggNq8m0gnRkAxarMncZ1gVWSJ-QNS3IFXzmIdQE=.5e847c08-973d-4517-8778-c9d6d33c7b50@github.com> References: <4KiGggNq8m0gnRkAxarMncZ1gVWSJ-QNS3IFXzmIdQE=.5e847c08-973d-4517-8778-c9d6d33c7b50@github.com> Message-ID: On Fri, 16 Sep 2022 12:00:43 GMT, Christian Hagedorn wrote: > In `testKnownLimit()`, we directly use the (pre-incremented) iv phi `IV_PHI_i` (`232 Phi`) in the loop exit check of the `while` loop: > > ![Screenshot from 2022-09-16 10-33-58](https://user-images.githubusercontent.com/17833009/190604454-50aa1b1e-7111-4723-a329-b95e0f26c220.png) > > Such pre-incremented iv phi uses after the loop are detected in `PhaseIdealLoop::reorg_offsets()` and replaced in order to reduce register pressure. We insert an additional `Opaque2` node to prevent any optimizations to undo the effect of `PhaseIdealLoop::reorg_offsets()`: > > > // iv Phi iv Phi > // | | > // | AddI (+stride) > // | | > // | Opaque2 # Blocks IGVN from folding these nodes until loop opts are over. > // | ====> | > // | AddI (-stride) > // | | > // | CastII # Preserve type of iv Phi > // | | > // Outside Use Outside Use > > > In the test case, this is done before CCP and looks like this: > > ![Screenshot from 2022-09-16 10-33-35](https://user-images.githubusercontent.com/17833009/190623922-3b0c9eeb-8468-4cd7-8fe1-1f7df3dc5071.png) > > At that point, we do not know yet that the `while` loop is only gonna be executed once (i.e. `422 CountedLoopEnd` is always false). This only becomes known after CCP where the type of `232 Phi` improves. But since we have an `Opaque2` node, this update is not propagated until the `Opaque2` nodes are removed in macro expansion: > > https://github.com/openjdk/jdk/blob/11e7d53b23796cbd3d878048f7553885ae07f4d1/src/hotspot/share/opto/macro.cpp#L2412-L2414 > > During macro expansion, we also adjust the strip mined loop: We move the `421 Bool` of the inner loop exit check `422 CountedLoopEnd` to the outer strip mined loop exit check and adjust the inner loop exit check in such a way that C2 cannot figure out that the entire loop is only be run once. In the next IGVN phase, the outer strip mined loop node is removed while the inner loop `429 CountedLoop` is not. > > Later in `verify_strip_mined()`, we cannot find the outer strip mined loop of `429 CountedLoop` anymore and we fail with the assertion. > > The first thought to fix this problem is to add `Opaque2::Value()` to let type information flow. But this does not fix the problem completely if the type of the iv phi has no known upper limit. There we have the problem that in general `type(phi) != type(phi + x - x)` because `phi + x` could overflow and we end up with type `int` (which happens in `testUnknownLimit()`). > > I therefore suggest to remove `Opaque2` nodes earlier before macro expansion to fix this bug. A good place seems to be right after loop opts are over. We can remove them at the same time as `Opaque1` nodes by adding a similar `Identity()` method. This lets the loop nodes to be folded away before trying to adjust the outer strip mined loop limit. > > #### Are Opaque2 nodes really useful? > > When working on this bug, I started to question the usage of `Opaque2` nodes in general. We are still running IGVN after `Opaque2` nodes are currently removed. This simply undoes the effects of `PhaseIdealLoop::reorg_offsets()` again and end up using pre-incremented iv phis anyways. My theory was that we are either blocking some specific optimizations during loop opts which cannot be reverted later in IGVN or that we initially (when this `Opaque2` optimization was added) did not run IGVN anymore once `Opaque2` nodes are removed. > > I could not think of any such non-revertable optimization that `Opaque2` nodes could prevent. On top of that, `PhaseIdealLoop::reorg_offsets()` also does not mention anything alike. I therefore had a look at the history of `Opaque2` nodes. Unfortunately, they were added before the initial load commit. I've dug deeper through some old closed repo and found that at the time the `Opaque2` nodes were introduced around 20 years ago, we did not do any IGVN anymore after the removal of the `Opaque2` nodes - and we generated code with these unoptimized `iv phi + x - x` patterns. > > This suggests that today the `Opaque2` nodes are indeed not really doing what they were originally supposed to do. I would therefore suggest to investigate their complete removal in a separate RFE and go with the suggested fix above which does not make the current situation of the questionable `Opaque2` node usages any worse. > > Thanks, > Christian Looks good to me. > This suggests that today the `Opaque2` nodes are indeed not really doing what they were originally supposed to do. I would therefore suggest to investigate their complete removal in a separate RFE and go with the suggested fix above which does not make the current situation of the questionable `Opaque2` node usages any worse. Makes sense to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.org/jdk/pull/10306 From chagedorn at openjdk.org Fri Sep 16 13:50:44 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 Sep 2022 13:50:44 GMT Subject: RFR: 8292088: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue In-Reply-To: <4KiGggNq8m0gnRkAxarMncZ1gVWSJ-QNS3IFXzmIdQE=.5e847c08-973d-4517-8778-c9d6d33c7b50@github.com> References: <4KiGggNq8m0gnRkAxarMncZ1gVWSJ-QNS3IFXzmIdQE=.5e847c08-973d-4517-8778-c9d6d33c7b50@github.com> Message-ID: On Fri, 16 Sep 2022 12:00:43 GMT, Christian Hagedorn wrote: > In `testKnownLimit()`, we directly use the (pre-incremented) iv phi `IV_PHI_i` (`232 Phi`) in the loop exit check of the `while` loop: > > ![Screenshot from 2022-09-16 10-33-58](https://user-images.githubusercontent.com/17833009/190604454-50aa1b1e-7111-4723-a329-b95e0f26c220.png) > > Such pre-incremented iv phi uses after the loop are detected in `PhaseIdealLoop::reorg_offsets()` and replaced in order to reduce register pressure. We insert an additional `Opaque2` node to prevent any optimizations to undo the effect of `PhaseIdealLoop::reorg_offsets()`: > > > // iv Phi iv Phi > // | | > // | AddI (+stride) > // | | > // | Opaque2 # Blocks IGVN from folding these nodes until loop opts are over. > // | ====> | > // | AddI (-stride) > // | | > // | CastII # Preserve type of iv Phi > // | | > // Outside Use Outside Use > > > In the test case, this is done before CCP and looks like this: > > ![Screenshot from 2022-09-16 10-33-35](https://user-images.githubusercontent.com/17833009/190623922-3b0c9eeb-8468-4cd7-8fe1-1f7df3dc5071.png) > > At that point, we do not know yet that the `while` loop is only gonna be executed once (i.e. `422 CountedLoopEnd` is always false). This only becomes known after CCP where the type of `232 Phi` improves. But since we have an `Opaque2` node, this update is not propagated until the `Opaque2` nodes are removed in macro expansion: > > https://github.com/openjdk/jdk/blob/11e7d53b23796cbd3d878048f7553885ae07f4d1/src/hotspot/share/opto/macro.cpp#L2412-L2414 > > During macro expansion, we also adjust the strip mined loop: We move the `421 Bool` of the inner loop exit check `422 CountedLoopEnd` to the outer strip mined loop exit check and adjust the inner loop exit check in such a way that C2 cannot figure out that the entire loop is only run once. In the next IGVN phase, the outer strip mined loop node is removed while the inner loop `429 CountedLoop` is not. > > Later in `verify_strip_mined()`, we cannot find the outer strip mined loop of `429 CountedLoop` anymore and we fail with the assertion. > > The first thought to fix this problem is to add `Opaque2::Value()` to let type information flow. But this does not fix the problem completely if the type of the iv phi has no known upper limit. There we have the problem that in general `type(phi) != type(phi + x - x)` because `phi + x` could overflow and we end up with type `int` (which happens in `testUnknownLimit()`). > > I therefore suggest to remove `Opaque2` nodes earlier before macro expansion to fix this bug. A good place seems to be right after loop opts are over. We can remove them at the same time as `Opaque1` nodes by adding a similar `Identity()` method. This lets the loop nodes to be folded away before trying to adjust the outer strip mined loop limit. > > #### Are Opaque2 nodes really useful? > > When working on this bug, I started to question the usage of `Opaque2` nodes in general. We are still running IGVN after `Opaque2` nodes are currently removed. This simply undoes the effects of `PhaseIdealLoop::reorg_offsets()` again and we end up using pre-incremented iv phis anyways. My theory was that we are either blocking some specific optimizations during loop opts which cannot be reverted later in IGVN or that we initially (when this `Opaque2` optimization was added) did not run IGVN anymore once `Opaque2` nodes are removed. > > I could not think of any such non-revertable optimization that `Opaque2` nodes could prevent. On top of that, `PhaseIdealLoop::reorg_offsets()` also does not mention anything alike. I therefore had a look at the history of `Opaque2` nodes. Unfortunately, they were added before the initial load commit. I've dug deeper through some old closed repo and found that at the time the `Opaque2` nodes were introduced around 20 years ago, we did not do any IGVN anymore after the removal of the `Opaque2` nodes - and we generated code with these unoptimized `iv phi + x - x` patterns. > > This suggests that today the `Opaque2` nodes are indeed not really doing what they were originally supposed to do. I would therefore suggest to investigate their complete removal in a separate RFE and go with the suggested fix above which does not make the current situation of the questionable `Opaque2` node usages any worse. > > Thanks, > Christian Thanks Roland for your review and your feedback to further investigate the removal of `Opaque2` nodes. ------------- PR: https://git.openjdk.org/jdk/pull/10306 From jbhateja at openjdk.org Fri Sep 16 14:16:48 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 16 Sep 2022 14:16:48 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v6] In-Reply-To: References: Message-ID: On Fri, 16 Sep 2022 05:33:59 GMT, Quan Anh Mai wrote: >> The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. >> >> Testing: Manually run the test case in the JBS and look at the compiled code. >> >> I also do some small clean-ups in x86_64.ad: >> >> - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. >> - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. >> - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed >> - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. >> - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. >> >> Please kindly review, thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add jmhs Patch is doing some paper cuts optimization, looks ok to me. src/hotspot/cpu/x86/x86_64.ad line 10210: > 10208: ins_encode %{ > 10209: // movzbl zeroes out the upper 32-bit and does not need REX.W > 10210: __ movzbl($dst$$Register, $src$$Register); Saving of redundant REX byte is allocations happen in lower register bank look goods. ------------- Marked as reviewed by jbhateja (Reviewer). PR: https://git.openjdk.org/jdk/pull/9791 From jbhateja at openjdk.org Fri Sep 16 14:16:50 2022 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 16 Sep 2022 14:16:50 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v5] In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 09:17:07 GMT, Quan Anh Mai wrote: >> The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. >> >> Testing: Manually run the test case in the JBS and look at the compiled code. >> >> I also do some small clean-ups in x86_64.ad: >> >> - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. >> - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. >> - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed >> - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. >> - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. >> >> Please kindly review, thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > divL_10 src/hotspot/cpu/x86/x86_64.ad line 9764: > 9762: > 9763: // And Register with Immediate 255 > 9764: instruct andI_rReg_imm255(rRegI dst, rRegI src, immI_255 mask) Looks good, will save additional move emittion for two address instruction by RA, generally which will be sweeped out if both src and dst registers have same encodings, else may result into spill in case of high register pressure blocks. ------------- PR: https://git.openjdk.org/jdk/pull/9791 From epeter at openjdk.org Fri Sep 16 14:44:05 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Sep 2022 14:44:05 GMT Subject: RFR: 8287217: C2: PhaseCCP: remove not visited nodes, prevent type inconsistency In-Reply-To: References: Message-ID: <0vCgdAbVz_llmr8ofWp7mmouX1LFpzNLpx4HxbdCRUg=.767f96e7-7b0e-4681-b0dd-97e4f5fc0589@github.com> On Fri, 16 Sep 2022 08:51:31 GMT, Christian Hagedorn wrote: >> **Context:** >> [JDK-8265973](https://bugs.openjdk.org/browse/JDK-8265973) Fix in Valhalla repository (Tobias @TobiHartmann ). >> [JDK-8290711](https://bugs.openjdk.org/browse/JDK-8290711) Fix in mainline (Roland @rwestrel ). >> Tobias' fix is a superset of Rolands. Unfortunately, Tobias gave up his fix once Rolands came up, because they did not have tests that required the superset fix. >> >> We now have such a test, where Rolands fix is not sufficient. But Tobias' fix is the solution. So I ported Tobias' fix to mainline. >> >> **Analysis:** >> In this bug, we have two `LoadB` nodes re-pushing themselves to the `igvn.worklist`, without end. This leads to an assert after too many iterations. >> >> `PhaseCCP::analyze` is looking at a post-loop. The loop has a memory access, so there is a `null_check`. The data-part of the loop is connected down to Root via this `null_check` >> (`Phi-> CmpP -> Bool -> If -> IfFalse -> Region -> CallStaticJava uncommon_trap -> ... -> Root`). >> During `CCP::analyze`, we discover that the memory address is NonNull. So we update the `phase->type(n)` for many of the data-nodes of the loop. >> >> During `PhaseCCP::do_transform`, we now traverse recursively up from the root, visiting all reachable nodes. >> When we visit a node, we store the cached `phase->type(n)` into the node, making the node's type consistent. >> We traverse up through the `null_check`, through the `uncommon_trap`, and the `If`, to the `Bool` node. >> `BoolNode::Value` realizes that we can never have Null, and is subsumed by constant `#int:1` (true). >> This means that the data-part of the loop just lost its connection down to Root. >> The traversal now also does not reach further than the Bool node which was just subsumed, and hence does not reach the data-part of the loop. >> This means we have nodes with inconsistent type. >> >> Summary: CCP disconnects the last path down to root for a data-loop, because it realizes that a `null_check` will never trap. The disconnected state means the types of the data-loop may be left inconsistent. >> >> Right after PhaseCCP, we continue with IGVN. >> The `LoadB` from that data-part of the loop has `MemNode::Ideal_common` called, which defers its transformation until the type of the address is consistent. However, this is never made consistent, as it is already left inconsistent after PhaseCCP. >> https://github.com/openjdk/jdk/blob/aa7ccdf44549a52cce9e99f6569097d3343d9ee4/src/hotspot/share/opto/memnode.cpp#L351-L358 >> Note that we only re-push if there is another node in the worklist - a node that hopefully has something to do with the address. >> But in our case it is just the two LoadB nodes, which were generated from the same `split_through_phi`. >> >> **Solution:** >> At the end of PhaseCCP, we remove all nodes that were not visited (and may have an inconsistent state). We can do this because we visited all nodes that are still relevant. Rolands fix already made sure that SafePointNodes are visited, such that infinite loops are covered as well. >> >> Regression test added. >> Test suite passed. > > src/hotspot/share/opto/phaseX.cpp line 1986: > >> 1984: >> 1985: while ( transform_stack.is_nonempty() ) { >> 1986: Node *clone = transform_stack.pop(); > > Even though it's old code, you could also fix the code style when touching the code: > Suggestion: > > while (transform_stack.is_nonempty()) { > Node* clone = transform_stack.pop(); yes, makes sense, thanks for catching that > test/hotspot/jtreg/compiler/ccp/TestRemoveUnreachableCCP.java line 3: > >> 1: /* >> 2: * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved. >> 3: * > > Looking at other files, I think this empty line should be removed. ok, will do that > test/hotspot/jtreg/compiler/ccp/TestRemoveUnreachableCCP.java line 29: > >> 27: * @bug 8287217 >> 28: * @summary CCP must remove nodes that are not traversed, else their type can be inconsistent >> 29: * @run main/othervm -Xcomp -Xbatch -XX:CompileCommand=compileOnly,TestRemoveUnreachableCCP::test > > `-Xbatch` can be removed as it is implied by `-Xcomp`. I think we should use `compileonly` instead of `compileOnly`. But it is case insensitive, so it does not really matter. good point > test/hotspot/jtreg/compiler/ccp/TestRemoveUnreachableCCP.java line 45: > >> 43: >> 44: public static void main(String[] strArr) { >> 45: TestRemoveUnreachableCCP _instance = new TestRemoveUnreachableCCP(); > > The instance is unused and can be removed. thanks for catching that ------------- PR: https://git.openjdk.org/jdk/pull/10250 From epeter at openjdk.org Fri Sep 16 14:48:04 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 Sep 2022 14:48:04 GMT Subject: RFR: 8287217: C2: PhaseCCP: remove not visited nodes, prevent type inconsistency [v2] In-Reply-To: References: Message-ID: > **Context:** > [JDK-8265973](https://bugs.openjdk.org/browse/JDK-8265973) Fix in Valhalla repository (Tobias @TobiHartmann ). > [JDK-8290711](https://bugs.openjdk.org/browse/JDK-8290711) Fix in mainline (Roland @rwestrel ). > Tobias' fix is a superset of Rolands. Unfortunately, Tobias gave up his fix once Rolands came up, because they did not have tests that required the superset fix. > > We now have such a test, where Rolands fix is not sufficient. But Tobias' fix is the solution. So I ported Tobias' fix to mainline. > > **Analysis:** > In this bug, we have two `LoadB` nodes re-pushing themselves to the `igvn.worklist`, without end. This leads to an assert after too many iterations. > > `PhaseCCP::analyze` is looking at a post-loop. The loop has a memory access, so there is a `null_check`. The data-part of the loop is connected down to Root via this `null_check` > (`Phi-> CmpP -> Bool -> If -> IfFalse -> Region -> CallStaticJava uncommon_trap -> ... -> Root`). > During `CCP::analyze`, we discover that the memory address is NonNull. So we update the `phase->type(n)` for many of the data-nodes of the loop. > > During `PhaseCCP::do_transform`, we now traverse recursively up from the root, visiting all reachable nodes. > When we visit a node, we store the cached `phase->type(n)` into the node, making the node's type consistent. > We traverse up through the `null_check`, through the `uncommon_trap`, and the `If`, to the `Bool` node. > `BoolNode::Value` realizes that we can never have Null, and is subsumed by constant `#int:1` (true). > This means that the data-part of the loop just lost its connection down to Root. > The traversal now also does not reach further than the Bool node which was just subsumed, and hence does not reach the data-part of the loop. > This means we have nodes with inconsistent type. > > Summary: CCP disconnects the last path down to root for a data-loop, because it realizes that a `null_check` will never trap. The disconnected state means the types of the data-loop may be left inconsistent. > > Right after PhaseCCP, we continue with IGVN. > The `LoadB` from that data-part of the loop has `MemNode::Ideal_common` called, which defers its transformation until the type of the address is consistent. However, this is never made consistent, as it is already left inconsistent after PhaseCCP. > https://github.com/openjdk/jdk/blob/aa7ccdf44549a52cce9e99f6569097d3343d9ee4/src/hotspot/share/opto/memnode.cpp#L351-L358 > Note that we only re-push if there is another node in the worklist - a node that hopefully has something to do with the address. > But in our case it is just the two LoadB nodes, which were generated from the same `split_through_phi`. > > **Solution:** > At the end of PhaseCCP, we remove all nodes that were not visited (and may have an inconsistent state). We can do this because we visited all nodes that are still relevant. Rolands fix already made sure that SafePointNodes are visited, such that infinite loops are covered as well. > > Regression test added. > Test suite passed. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: implementing Christians review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10250/files - new: https://git.openjdk.org/jdk/pull/10250/files/7dc29ae8..60c16f24 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10250&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10250&range=00-01 Stats: 5 lines in 2 files changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10250.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10250/head:pull/10250 PR: https://git.openjdk.org/jdk/pull/10250 From tholenstein at openjdk.org Fri Sep 16 14:58:04 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 16 Sep 2022 14:58:04 GMT Subject: RFR: JDK-8293480: IGV: Update Bytecode and ControlFlow Component immediately when opening a new graph [v6] In-Reply-To: References: Message-ID: > The `BytecodeViewTopComponent` and `ControlFlowTopComponent` represent information depending on what graph is open in `EditorTopComponent`. Previously, `BytecodeViewTopComponent` and `ControlFlowTopComponent` did not update its content immediately when a new graph from a different group is opened in `EditorTopComponent`. They also did not update when switching between two tabs of open graph. > > We missed to `fire()` a `diagramChangedEvent` in the constructor of `EditorTopComponent`. We also need to fire when `BytecodeViewTopComponent` and `ControlFlowTopComponent` are initially opened. > Update Tobias Holenstein has updated the pull request incrementally with four additional commits since the last revision: - Fix for: parent is null for DiffGraph - remove DiagramProvider - add ChangedListener to LookupHistory - update OutlineTopComponent graph closed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10196/files - new: https://git.openjdk.org/jdk/pull/10196/files/2a643165..31b9d673 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10196&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10196&range=04-05 Stats: 399 lines in 11 files changed: 134 ins; 200 del; 65 mod Patch: https://git.openjdk.org/jdk/pull/10196.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10196/head:pull/10196 PR: https://git.openjdk.org/jdk/pull/10196 From kvn at openjdk.org Fri Sep 16 15:20:46 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 16 Sep 2022 15:20:46 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v6] In-Reply-To: References: Message-ID: On Fri, 16 Sep 2022 05:33:59 GMT, Quan Anh Mai wrote: >> The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. >> >> Testing: Manually run the test case in the JBS and look at the compiled code. >> >> I also do some small clean-ups in x86_64.ad: >> >> - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. >> - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. >> - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed >> - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. >> - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. >> >> Please kindly review, thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add jmhs Thank you for updating tests. I am running testing now. ------------- PR: https://git.openjdk.org/jdk/pull/9791 From kvn at openjdk.org Fri Sep 16 15:24:41 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 16 Sep 2022 15:24:41 GMT Subject: RFR: 8293937: x86: Drop LP64 conditions from clearly x86_32 code In-Reply-To: <8UHAbltEwlCwzkbxK6TIlK41HT9rlKqsICS-rMbVOKY=.db04836b-8559-4203-ae29-488ec4d2234b@github.com> References: <8UHAbltEwlCwzkbxK6TIlK41HT9rlKqsICS-rMbVOKY=.db04836b-8559-4203-ae29-488ec4d2234b@github.com> Message-ID: <9cYbnnrunJhyxG-V1SE5RHTBKvvwwGZwzwgMb28GihU=.954bb54e-3789-47d8-b2b2-399714fa96b6@github.com> On Fri, 16 Sep 2022 11:10:29 GMT, Aleksey Shipilev wrote: > Noticed this when porting Loom on x86_32. There are `*_x86_32.cpp` files that use `_LP64` as if it matters for them. It does not make sense, as in those files we always have `!_LP64`. We can drop the conditionals and clean the code. > > Proof of completeness: > > > $ ack LP64 src/hotspot/ | grep _32 > src/hotspot/cpu/x86/register_x86.hpp:386: NOT_LP64( 8 + ) // FILL0-FILL7 in x86_32.ad > src/hotspot/cpu/x86/vm_version_x86.hpp:733: return LP64_ONLY(true) NOT_LP64(false); // not implemented on x86_32 Good ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10305 From tholenstein at openjdk.org Fri Sep 16 15:44:44 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 16 Sep 2022 15:44:44 GMT Subject: RFR: JDK-8293480: IGV: Update Bytecode and ControlFlow Component immediately when opening a new graph [v6] In-Reply-To: References: Message-ID: On Fri, 16 Sep 2022 14:58:04 GMT, Tobias Holenstein wrote: >> The `BytecodeViewTopComponent` and `ControlFlowTopComponent` represent information depending on which graph is opened in `EditorTopComponent`. Previously `BytecodeViewTopComponent` and `ControlFlowTopComponent` did not update their contents immediately when a new graph from another group was opened in `EditorTopComponent`. They were also not updated when switching between two tabs of an open graph. >> `OutlineTopComponent` had the same problem to update the selected graphs according to the `EditorTopComponent` >> >> Update >> >> **Analysis** >> `BytecodeViewTopComponent`, `ControlFlowTopComponent` and `OutlineTopComponent` each represent information that depends on the `InputGraph` of the currently or last active `EditorTopComponent`. This information is made available globally by adding a new `InputGraphProvider` to the `Lookup` of `EditorTopComponent` each time the `InputGraph` changes. `BytecodeViewTopComponent`, `ControlFlowTopComponent` and `OutlineTopComponent` implement a `LookupListener` that calls `resultChanged(LookupEvent lookupEvent)` whenever a `InputGraphProvider` changes in in the lookup of `Utilities.actionsGlobalContext()`. When such a change happens the last active `InputGraphProvider` is retrieved from the `LookupHistory` by the listening components. >> >> **Problem** >> First, we missed to `fire()` a `diagramChangedEvent` in the constructor of `EditorTopComponent` which trigger to add the `InputGraphProvider` to the `Lookup` >> >> Second, `Utilities.actionsGlobalContext()` returns a `Lookup` of the active (focused) `TopComponent's` `Lookup`. Unfortunately, when the last `EditorTopComponent` the `LookupListener` does not get called and there is no way to call is manually. >> >> **New Approach** >> We extends the `LookupHistory` class such that we can add a `ChangedListener` that gets called whenever the last active `InputGraphProvider` is cached. So instead of listening to changes in `Utilities.actionsGlobalContext()` and then consulting the `LookupHistory`, now `BytecodeViewTopComponent`, `ControlFlowTopComponent` and `OutlineTopComponent` listen to changes in the `LookupHistory` directly. This way we can now call `terminate` in the `LookupHistory` whenever we close a `EditorTopComponent`, which directly notifies the listeners. > > Tobias Holenstein has updated the pull request incrementally with four additional commits since the last revision: > > - Fix for: parent is null for DiffGraph > - remove DiagramProvider > - add ChangedListener to LookupHistory > - update OutlineTopComponent graph closed > fix > Thanks for this UI improvement, Tobias, looks good to me! There is one more case where the Bytecode and Control Flow windows get out of sync: after removing all graphs and groups in the Outline, they still show the content of the graph that was last active: > > ![bytecode-and-cfg-leftovers](https://user-images.githubusercontent.com/8792647/189114719-770ba617-e94c-4492-a5ab-81047b8a0b98.png) > > This problem existed before the changeset, so it might be addressed here or in a separate issue, whatever you think makes more sense. You are right, @robcasloz, my last fix didn't solve the problem. There was a fundamental problem with using `Utilities.actionsGlobalContext()` - I pushed a new version and updated the PR. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10196 From stuefe at openjdk.org Fri Sep 16 16:53:45 2022 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 16 Sep 2022 16:53:45 GMT Subject: RFR: 8293937: x86: Drop LP64 conditions from clearly x86_32 code In-Reply-To: <8UHAbltEwlCwzkbxK6TIlK41HT9rlKqsICS-rMbVOKY=.db04836b-8559-4203-ae29-488ec4d2234b@github.com> References: <8UHAbltEwlCwzkbxK6TIlK41HT9rlKqsICS-rMbVOKY=.db04836b-8559-4203-ae29-488ec4d2234b@github.com> Message-ID: On Fri, 16 Sep 2022 11:10:29 GMT, Aleksey Shipilev wrote: > Noticed this when porting Loom on x86_32. There are `*_x86_32.cpp` files that use `_LP64` as if it matters for them. It does not make sense, as in those files we always have `!_LP64`. We can drop the conditionals and clean the code. > > Proof of completeness: > > > $ ack LP64 src/hotspot/ | grep _32 > src/hotspot/cpu/x86/register_x86.hpp:386: NOT_LP64( 8 + ) // FILL0-FILL7 in x86_32.ad > src/hotspot/cpu/x86/vm_version_x86.hpp:733: return LP64_ONLY(true) NOT_LP64(false); // not implemented on x86_32 Marked as reviewed by stuefe (Reviewer). Looks good. I never really thought about how this bifurcation works in x86. So, this is build magic, it just avoids x86_64 files for 32bit builds and vice versa? ------------- PR: https://git.openjdk.org/jdk/pull/10305 From kvn at openjdk.org Fri Sep 16 17:26:45 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 16 Sep 2022 17:26:45 GMT Subject: RFR: 8281453: New optimization: convert `~x` into `-1-x` when `~x` is used in an arithmetic expression [v9] In-Reply-To: References: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> Message-ID: <7s3iiJu4OT8StlurN51uZq9fJcCty-fex34GpQnSoZs=.3e4c0038-d91b-4513-8ffd-ae0f99d20740@github.com> On Thu, 15 Sep 2022 16:53:44 GMT, Zhiqiang Zang wrote: >> Similar to `(~x)+c` -> `(c-1)-x` and `~(x+c)` -> `(-c-1)-x` in #6858, we can also introduce similar optimizations for subtraction, `c-(~x)` -> `x+(c+1)` and `~(c-x)` -> `x+(-c-1)`. >> >> To generalize, I convert `~x` into `-1-x` when `~x` is used in an arithmetic expression. For example, `c-(~x)` will be converted into `c-(-1-x)` which will match other pattern and will be transformed again in next iteration and finally become `x+(c+1)`. >> >> The results of the microbenchmark are as follows: >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> NotOpTransformation.baselineInt avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.baselineLong avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.testInt1 avgt 60 0.603 ? 0.001 ns/op >> NotOpTransformation.testInt2 avgt 60 0.603 ? 0.001 ns/op >> NotOpTransformation.testLong1 avgt 60 0.658 ? 0.001 ns/op >> NotOpTransformation.testLong2 avgt 60 0.658 ? 0.001 ns/op >> >> Patch: >> Benchmark Mode Cnt Score Error Units >> NotOpTransformation.baselineInt avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.baselineLong avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.testInt1 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testInt2 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testLong1 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testLong2 avgt 60 0.329 ? 0.001 ns/op > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > include microbenchmark. Testing passed. You need second review. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/7376 From kvn at openjdk.org Fri Sep 16 17:28:54 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 16 Sep 2022 17:28:54 GMT Subject: RFR: 8292088: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue In-Reply-To: <4KiGggNq8m0gnRkAxarMncZ1gVWSJ-QNS3IFXzmIdQE=.5e847c08-973d-4517-8778-c9d6d33c7b50@github.com> References: <4KiGggNq8m0gnRkAxarMncZ1gVWSJ-QNS3IFXzmIdQE=.5e847c08-973d-4517-8778-c9d6d33c7b50@github.com> Message-ID: On Fri, 16 Sep 2022 12:00:43 GMT, Christian Hagedorn wrote: > In `testKnownLimit()`, we directly use the (pre-incremented) iv phi `IV_PHI_i` (`232 Phi`) in the loop exit check of the `while` loop: > > ![Screenshot from 2022-09-16 10-33-58](https://user-images.githubusercontent.com/17833009/190604454-50aa1b1e-7111-4723-a329-b95e0f26c220.png) > > Such pre-incremented iv phi uses after the loop are detected in `PhaseIdealLoop::reorg_offsets()` and replaced in order to reduce register pressure. We insert an additional `Opaque2` node to prevent any optimizations to undo the effect of `PhaseIdealLoop::reorg_offsets()`: > > > // iv Phi iv Phi > // | | > // | AddI (+stride) > // | | > // | Opaque2 # Blocks IGVN from folding these nodes until loop opts are over. > // | ====> | > // | AddI (-stride) > // | | > // | CastII # Preserve type of iv Phi > // | | > // Outside Use Outside Use > > > In the test case, this is done before CCP and looks like this: > > ![Screenshot from 2022-09-16 10-33-35](https://user-images.githubusercontent.com/17833009/190623922-3b0c9eeb-8468-4cd7-8fe1-1f7df3dc5071.png) > > At that point, we do not know yet that the `while` loop is only gonna be executed once (i.e. `422 CountedLoopEnd` is always false). This only becomes known after CCP where the type of `232 Phi` improves. But since we have an `Opaque2` node, this update is not propagated until the `Opaque2` nodes are removed in macro expansion: > > https://github.com/openjdk/jdk/blob/11e7d53b23796cbd3d878048f7553885ae07f4d1/src/hotspot/share/opto/macro.cpp#L2412-L2414 > > During macro expansion, we also adjust the strip mined loop: We move the `421 Bool` of the inner loop exit check `422 CountedLoopEnd` to the outer strip mined loop exit check and adjust the inner loop exit check in such a way that C2 cannot figure out that the entire loop is only run once. In the next IGVN phase, the outer strip mined loop node is removed while the inner loop `429 CountedLoop` is not. > > Later in `verify_strip_mined()`, we cannot find the outer strip mined loop of `429 CountedLoop` anymore and we fail with the assertion. > > The first thought to fix this problem is to add `Opaque2::Value()` to let type information flow. But this does not fix the problem completely if the type of the iv phi has no known upper limit. There we have the problem that in general `type(phi) != type(phi + x - x)` because `phi + x` could overflow and we end up with type `int` (which happens in `testUnknownLimit()`). > > I therefore suggest to remove `Opaque2` nodes earlier before macro expansion to fix this bug. A good place seems to be right after loop opts are over. We can remove them at the same time as `Opaque1` nodes by adding a similar `Identity()` method. This lets the loop nodes to be folded away before trying to adjust the outer strip mined loop limit. > > #### Are Opaque2 nodes really useful? > > When working on this bug, I started to question the usage of `Opaque2` nodes in general. We are still running IGVN after `Opaque2` nodes are currently removed. This simply undoes the effects of `PhaseIdealLoop::reorg_offsets()` again and we end up using pre-incremented iv phis anyways. My theory was that we are either blocking some specific optimizations during loop opts which cannot be reverted later in IGVN or that we initially (when this `Opaque2` optimization was added) did not run IGVN anymore once `Opaque2` nodes are removed. > > I could not think of any such non-revertable optimization that `Opaque2` nodes could prevent. On top of that, `PhaseIdealLoop::reorg_offsets()` also does not mention anything alike. I therefore had a look at the history of `Opaque2` nodes. Unfortunately, they were added before the initial load commit. I've dug deeper through some old closed repo and found that at the time the `Opaque2` nodes were introduced around 20 years ago, we did not do any IGVN anymore after the removal of the `Opaque2` nodes - and we generated code with these unoptimized `iv phi + x - x` patterns. > > This suggests that today the `Opaque2` nodes are indeed not really doing what they were originally supposed to do. I would therefore suggest to investigate their complete removal in a separate RFE and go with the suggested fix above which does not make the current situation of the questionable `Opaque2` node usages any worse. > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10306 From kvn at openjdk.org Fri Sep 16 17:31:40 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 16 Sep 2022 17:31:40 GMT Subject: RFR: 8293849: PrintIdealPhase in compiler directives file is ignored when used with other compile commands In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 11:20:06 GMT, Christian Hagedorn wrote: > When using a compiler directives file with `PrintIdealPhase`: > > > [ > { > match : "Test::*", > log : true, > PrintIdealPhase : "BEFORE_MATCHING" > } > ] > > > together with other compile commands specified in `compilerdirectives_common_flags` and/or `compilerdirectives_c2_flags`: > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/compiler/compilerDirectives.hpp#L38-L39 > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/compiler/compilerDirectives.hpp#L63-L64 > > then the `PrintIdealPhase` option is ignored. > > The reason is that when cloning the `DirectiveSet` for the current compilation in `DirectiveSet::clone()`, we only set `PrintIdealPhaseOption` but forget to also set `_ideal_phase_name_mask` which is used when deciding if a compile phase should be dumped or not. As a result, the mask keeps its default value zero and nothing is dumped because `Compile::shoud_print_phase()` returns false: > > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/opto/compile.cpp#L5060-L5067 > > > The fix is to also clone the old value of `_ideal_phase_name_mask`. > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10283 From vlivanov at openjdk.org Fri Sep 16 17:57:39 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 16 Sep 2022 17:57:39 GMT Subject: RFR: 8293816: CI: ciBytecodeStream::get_klass() is not consistent In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 19:58:13 GMT, Vladimir Ivanov wrote: > CI responses should be consistent during a single compilation. > > [JDK-8293044](https://bugs.openjdk.org/browse/JDK-8293044) was fixed by turning inaccessible classes into unloaded ones when resolving them through CI. > > But there's another case when `ciEnv::get_klass_by_index()` returns a loaded ciKlass while setting `will_link` to `false`: a not-yet-resolved klass revealed through a class loader constraint. > > In such case, after a concurrent class loading CI will start reporting a loaded ciKlass instance. Such inconsistency may trigger some paradoxical situations during compilation. > > The fix is to instantiate a proper instance of an unloaded ciKlass, so further requests will return the unloaded instances as well. > > Testing: hs-tier1 - hs-tier4 Thanks for the reviews, Dean and Vladimir. ------------- PR: https://git.openjdk.org/jdk/pull/10294 From vlivanov at openjdk.org Fri Sep 16 18:00:21 2022 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 16 Sep 2022 18:00:21 GMT Subject: Integrated: 8293816: CI: ciBytecodeStream::get_klass() is not consistent In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 19:58:13 GMT, Vladimir Ivanov wrote: > CI responses should be consistent during a single compilation. > > [JDK-8293044](https://bugs.openjdk.org/browse/JDK-8293044) was fixed by turning inaccessible classes into unloaded ones when resolving them through CI. > > But there's another case when `ciEnv::get_klass_by_index()` returns a loaded ciKlass while setting `will_link` to `false`: a not-yet-resolved klass revealed through a class loader constraint. > > In such case, after a concurrent class loading CI will start reporting a loaded ciKlass instance. Such inconsistency may trigger some paradoxical situations during compilation. > > The fix is to instantiate a proper instance of an unloaded ciKlass, so further requests will return the unloaded instances as well. > > Testing: hs-tier1 - hs-tier4 This pull request has now been integrated. Changeset: 746f5f58 Author: Vladimir Ivanov URL: https://git.openjdk.org/jdk/commit/746f5f589db5c1036f15fa47f8a48b2a12c921ce Stats: 10 lines in 3 files changed: 1 ins; 6 del; 3 mod 8293816: CI: ciBytecodeStream::get_klass() is not consistent Reviewed-by: dlong, kvn ------------- PR: https://git.openjdk.org/jdk/pull/10294 From kvn at openjdk.org Fri Sep 16 18:24:41 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 16 Sep 2022 18:24:41 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v6] In-Reply-To: References: Message-ID: On Fri, 16 Sep 2022 05:33:59 GMT, Quan Anh Mai wrote: >> The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. >> >> Testing: Manually run the test case in the JBS and look at the compiled code. >> >> I also do some small clean-ups in x86_64.ad: >> >> - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. >> - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. >> - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed >> - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. >> - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. >> >> Please kindly review, thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add jmhs My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/9791 From duke at openjdk.org Fri Sep 16 19:12:46 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Fri, 16 Sep 2022 19:12:46 GMT Subject: RFR: 8290917: x86: Memory-operand arithmetic instructions have too low costs [v6] In-Reply-To: References: Message-ID: <8HQsb_4v4DRSadGP7o5JQ9sxo1stWjh9ouahvz106rQ=.16f1d76f-e36f-4400-802d-9c7993146740@github.com> On Fri, 16 Sep 2022 05:33:59 GMT, Quan Anh Mai wrote: >> The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. >> >> Testing: Manually run the test case in the JBS and look at the compiled code. >> >> I also do some small clean-ups in x86_64.ad: >> >> - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. >> - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. >> - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed >> - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. >> - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. >> >> Please kindly review, thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add jmhs Thanks very much for your reviews! ------------- PR: https://git.openjdk.org/jdk/pull/9791 From duke at openjdk.org Fri Sep 16 20:11:31 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Fri, 16 Sep 2022 20:11:31 GMT Subject: Integrated: 8290917: x86: Memory-operand arithmetic instructions have too low costs In-Reply-To: References: Message-ID: On Sat, 6 Aug 2022 09:53:01 GMT, Quan Anh Mai wrote: > The pattern `AddI (LoadI mem) imm` should be matched by a load followed by an add with constant, instead, it is currently matched as a constant load followed by an add with memory. The reason is that the cost of `addI_rReg_mem` is too low, this patch fixes this by increasing the cost of this fused instruction. > > Testing: Manually run the test case in the JBS and look at the compiled code. > > I also do some small clean-ups in x86_64.ad: > > - The `mulHiL` rules have unnecessary constraints on the input registers, these can be removed. The `no_rax_RegL` operand as a consequence can also be removed. > - The rules involving long division by a constant can be removed because it has been covered by the optimiser during idealisation. > - The pattern `SubI src imm` and the likes never match because they are converted to `AddI src -imm` by the optimiser. As a result, these rules can be removed > - The rules involving shifting the argument by 1 are covered by and exactly the same as the corresponding rules of shifting by an immediate. As a result, they can be removed. > - Some rules involving and-ing with a bit mask have unnecessary constraints on the target register. > > Please kindly review, thank you very much. This pull request has now been integrated. Changeset: 01e7b881 Author: Quan Anh Mai Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/01e7b8819918906082e315870e667b15910cee99 Stats: 447 lines in 4 files changed: 131 ins; 282 del; 34 mod 8290917: x86: Memory-operand arithmetic instructions have too low costs Reviewed-by: kvn, sviswanathan, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/9791 From duke at openjdk.org Fri Sep 16 20:35:45 2022 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 16 Sep 2022 20:35:45 GMT Subject: RFR: 8281453: New optimization: convert `~x` into `-1-x` when `~x` is used in an arithmetic expression [v9] In-Reply-To: <7s3iiJu4OT8StlurN51uZq9fJcCty-fex34GpQnSoZs=.3e4c0038-d91b-4513-8ffd-ae0f99d20740@github.com> References: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> <7s3iiJu4OT8StlurN51uZq9fJcCty-fex34GpQnSoZs=.3e4c0038-d91b-4513-8ffd-ae0f99d20740@github.com> Message-ID: On Fri, 16 Sep 2022 17:23:20 GMT, Vladimir Kozlov wrote: > Testing passed. > > You need second review. Thank you @vnkozlov for approving. Do you have some people for reviewing suggesting for me? ------------- PR: https://git.openjdk.org/jdk/pull/7376 From dlong at openjdk.org Fri Sep 16 21:16:50 2022 From: dlong at openjdk.org (Dean Long) Date: Fri, 16 Sep 2022 21:16:50 GMT Subject: RFR: 8289925: Shared code shouldn't reference the platform specific method frame::interpreter_frame_last_sp() [v3] In-Reply-To: <-CdoduIsZBNl7Hqje87jdrC9NbAiG4I-lNzsWqTItD4=.7871424a-5d99-4986-b72b-4c187dd5c2a9@github.com> References: <-CdoduIsZBNl7Hqje87jdrC9NbAiG4I-lNzsWqTItD4=.7871424a-5d99-4986-b72b-4c187dd5c2a9@github.com> Message-ID: <5jvtd7c7Dqz8w6oXi27Ez7BaS_zgIF_HCacIwDLsHHU=.85fbc7bf-b5bf-4776-9400-f8c53506e0e9@github.com> On Thu, 15 Sep 2022 07:43:23 GMT, Richard Reingruber wrote: >> The method `frame::interpreter_frame_last_sp()` is a platform method in the sense that it is not declared in a shared header file. It is declared and defined on some platforms though (x86 and aarch64 I think). >> >> `frame::interpreter_frame_last_sp()` existed on these platforms before vm continuations (aka loom). Shared code that is part of the vm continuations implementation references it. This breaks the platform abstraction. >> >> This fix simply removes the special case for interpreted frames in the shared method `Continuation::continuation_bottom_sender()`. I cannot see a reason for the distinction between interpreted and compiled frames. The shared code reference to `frame::interpreter_frame_last_sp()` is thereby eliminated. >> >> Testing: hotspot_loom and jdk_loom on x86_64 and aarch64. > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Only pass the actual sp when calling is_sp_in_continuation() > - Merge branch 'master' > - Merge branch 'master' > - Remove platform dependent method interpreter_frame_last_sp() from shared code Actually, for interpreted --> interpreted, aarch64 and ppc64 seem to do the same trimming. The difference is ppc64 does it in the caller, while aarch64 does it in the callee: https://github.com/openjdk/jdk/blob/725f41ffd4b137aef3f83700b4e181e9d93368d4/src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp#L1578 So this would mean for interpreted --> compiled, ppc64 does trimming and aarch64 does not. I also noticed that on ppc64, the value of unextended_sp for interpreted frames in inconsistent. Whether or not it is the "max stack" value depends on who calls the frame constructor. Some Loom code and sender_for_compiled_frame() sets unextended_sp the same as sp, while only sender_for_interpreter_frame() uses the "max stack" value. Giving unextended_sp a consistent "canonical" value that is always inside the frame, no matter if the callee is interpreted or compiled, seems like it would make unextended_sp() valid for is_sp_in_continuation() as well. ------------- PR: https://git.openjdk.org/jdk/pull/9411 From kvn at openjdk.org Fri Sep 16 22:05:46 2022 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 16 Sep 2022 22:05:46 GMT Subject: RFR: 8281453: New optimization: convert `~x` into `-1-x` when `~x` is used in an arithmetic expression In-Reply-To: References: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> Message-ID: On Wed, 9 Feb 2022 16:12:30 GMT, Quan Anh Mai wrote: >> Similar to `(~x)+c` -> `(c-1)-x` and `~(x+c)` -> `(-c-1)-x` in #6858, we can also introduce similar optimizations for subtraction, `c-(~x)` -> `x+(c+1)` and `~(c-x)` -> `x+(-c-1)`. >> >> To generalize, I convert `~x` into `-1-x` when `~x` is used in an arithmetic expression. For example, `c-(~x)` will be converted into `c-(-1-x)` which will match other pattern and will be transformed again in next iteration and finally become `x+(c+1)`. >> >> The results of the microbenchmark are as follows: >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> NotOpTransformation.baselineInt avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.baselineLong avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.testInt1 avgt 60 0.603 ? 0.001 ns/op >> NotOpTransformation.testInt2 avgt 60 0.603 ? 0.001 ns/op >> NotOpTransformation.testLong1 avgt 60 0.658 ? 0.001 ns/op >> NotOpTransformation.testLong2 avgt 60 0.658 ? 0.001 ns/op >> >> Patch: >> Benchmark Mode Cnt Score Error Units >> NotOpTransformation.baselineInt avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.baselineLong avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.testInt1 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testInt2 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testLong1 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testLong2 avgt 60 0.329 ? 0.001 ns/op > > Since `~x == -1 - x` and these 2 operations' costs are essentially the same. It would be much easier if you just check whether the not result is used in an arithmetic operation and transform the former to the latter. The reverse is also true, if you find a `-1 - x` being fed into a bitwise just transform it to a `~x` then. > Thanks. @merykitty can review it since he gave initial comment. ------------- PR: https://git.openjdk.org/jdk/pull/7376 From sviswanathan at openjdk.org Sat Sep 17 01:02:51 2022 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 17 Sep 2022 01:02:51 GMT Subject: RFR: 8288043: Optimize FP to word/sub-word integral type conversion on X86 AVX2 platforms [v4] In-Reply-To: References: Message-ID: On Sat, 10 Sep 2022 17:05:38 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch extends conversion optimizations added with [JDK-8287835](https://bugs.openjdk.org/browse/JDK-8287835) to optimize following floating point to integral conversions for X86 AVX2 targets:- >> * D2I , D2S, D2B, F2I , F2S, F2B >> >> In addition, it also optimizes following wide vector (64 bytes) double to integer and sub-type conversions for AVX512 targets which do not support AVX512DQ feature. >> * D2I, D2S, D2B >> >> Following are the JMH micro performance results with and without patch. >> >> System configuration: 40C 2S Icelake server (Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz) >> >> BENCHMARK | SIZE | BASELINE (ops/ms) | WITHOPT (ops/ms) | PERF GAIN FACTOR >> -- | -- | -- | -- | -- >> VectorFPtoIntCastOperations.microDouble128ToByte128 | 1024 | 90.603 | 92.797 | 1.024215534 >> VectorFPtoIntCastOperations.microDouble128ToByte256 | 1024 | 81.909 | 82.3 | 1.00477359 >> VectorFPtoIntCastOperations.microDouble128ToByte512 | 1024 | 26.181 | 26.244 | 1.002406325 >> VectorFPtoIntCastOperations.microDouble128ToInteger128 | 1024 | 90.74 | 2537.958 | 27.96956138 >> VectorFPtoIntCastOperations.microDouble128ToInteger256 | 1024 | 81.586 | 2429.599 | 29.7796068 >> VectorFPtoIntCastOperations.microDouble128ToInteger512 | 1024 | 19.406 | 19.61 | 1.010512213 >> VectorFPtoIntCastOperations.microDouble128ToLong128 | 1024 | 91.723 | 90.754 | 0.989435583 >> VectorFPtoIntCastOperations.microDouble128ToShort128 | 1024 | 91.766 | 1984.577 | 21.62649565 >> VectorFPtoIntCastOperations.microDouble128ToShort256 | 1024 | 81.949 | 1940.599 | 23.68056962 >> VectorFPtoIntCastOperations.microDouble128ToShort512 | 1024 | 16.468 | 16.56 | 1.005586592 >> VectorFPtoIntCastOperations.microDouble256ToByte128 | 1024 | 163.331 | 3018.351 | 18.479964 >> VectorFPtoIntCastOperations.microDouble256ToByte256 | 1024 | 148.878 | 3082.034 | 20.70174237 >> VectorFPtoIntCastOperations.microDouble256ToByte512 | 1024 | 50.108 | 51.629 | 1.030354434 >> VectorFPtoIntCastOperations.microDouble256ToInteger128 | 1024 | 159.805 | 4619.421 | 28.90661118 >> VectorFPtoIntCastOperations.microDouble256ToInteger256 | 1024 | 143.876 | 4649.642 | 32.31700909 >> VectorFPtoIntCastOperations.microDouble256ToInteger512 | 1024 | 38.127 | 38.188 | 1.001599916 >> VectorFPtoIntCastOperations.microDouble256ToLong128 | 1024 | 160.322 | 162.442 | 1.013223388 >> VectorFPtoIntCastOperations.microDouble256ToLong256 | 1024 | 141.252 | 143.01 | 1.012445841 >> VectorFPtoIntCastOperations.microDouble256ToShort128 | 1024 | 157.717 | 3757.471 | 23.82413437 >> VectorFPtoIntCastOperations.microDouble256ToShort256 | 1024 | 143.876 | 3830.971 | 26.62689399 >> VectorFPtoIntCastOperations.microDouble256ToShort512 | 1024 | 32.061 | 32.911 | 1.026511962 >> VectorFPtoIntCastOperations.microFloat128ToByte128 | 1024 | 146.599 | 4002.967 | 27.30555461 >> VectorFPtoIntCastOperations.microFloat128ToByte256 | 1024 | 136.99 | 3938.799 | 28.75245638 >> VectorFPtoIntCastOperations.microFloat128ToByte512 | 1024 | 51.561 | 50.284 | 0.975233219 >> VectorFPtoIntCastOperations.microFloat128ToInteger128 | 1024 | 5933.565 | 5361.472 | 0.903583596 >> VectorFPtoIntCastOperations.microFloat128ToInteger256 | 1024 | 5079.564 | 5062.046 | 0.996551279 >> VectorFPtoIntCastOperations.microFloat128ToInteger512 | 1024 | 37.101 | 38.419 | 1.035524649 >> VectorFPtoIntCastOperations.microFloat128ToLong128 | 1024 | 145.863 | 145.362 | 0.99656527 >> VectorFPtoIntCastOperations.microFloat128ToLong256 | 1024 | 131.159 | 133.154 | 1.015210546 >> VectorFPtoIntCastOperations.microFloat128ToShort128 | 1024 | 145.966 | 4150.039 | 28.4315457 >> VectorFPtoIntCastOperations.microFloat128ToShort256 | 1024 | 134.703 | 4566.589 | 33.90116775 >> VectorFPtoIntCastOperations.microFloat128ToShort512 | 1024 | 31.878 | 30.867 | 0.968285338 >> VectorFPtoIntCastOperations.microFloat256ToByte128 | 1024 | 237.841 | 6292.051 | 26.4548627 >> VectorFPtoIntCastOperations.microFloat256ToByte256 | 1024 | 222.041 | 6292.748 | 28.34047766 >> VectorFPtoIntCastOperations.microFloat256ToByte512 | 1024 | 92.073 | 88.981 | 0.966417951 >> VectorFPtoIntCastOperations.microFloat256ToInteger128 | 1024 | 11471.121 | 10269.636 | 0.895260019 >> VectorFPtoIntCastOperations.microFloat256ToInteger256 | 1024 | 10729.816 | 10105.92 | 0.941853989 >> VectorFPtoIntCastOperations.microFloat256ToInteger512 | 1024 | 68.328 | 70.005 | 1.024543379 >> VectorFPtoIntCastOperations.microFloat256ToLong128 | 1024 | 247.101 | 248.571 | 1.005948984 >> VectorFPtoIntCastOperations.microFloat256ToLong256 | 1024 | 225.74 | 223.987 | 0.992234429 >> VectorFPtoIntCastOperations.microFloat256ToLong512 | 1024 | 76.39 | 76.187 | 0.997342584 >> VectorFPtoIntCastOperations.microFloat256ToShort128 | 1024 | 233.196 | 8202.179 | 35.17289748 >> VectorFPtoIntCastOperations.microFloat256ToShort256 | 1024 | 220.75 | 7781.073 | 35.24834881 >> VectorFPtoIntCastOperations.microFloat256ToShort512 | 1024 | 58.143 | 55.633 | 0.956830573 >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8288043: Code re-factoring. Could you please enable the compiler/vectorapi/VectorFPtoIntCastTest.java for AVX2 platforms? Currently they are only run for AVX512DQ platforms. ------------- PR: https://git.openjdk.org/jdk/pull/9748 From duke at openjdk.org Sat Sep 17 09:42:41 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Sat, 17 Sep 2022 09:42:41 GMT Subject: RFR: 8292761: x86: Clone nodes to match complex rules [v4] In-Reply-To: References: Message-ID: <7lozKkn8Du15iEOwGnNL9uk9atr8L3RcvSvgeAYooVA=.639c1d53-9323-4aba-884d-0df278ee07f8@github.com> > Hi, > > This patch tries to clone a node if it can be matched as a part of a BMI and lea pattern. This may reduce the live range of a local or remove that local completely. > > Please take a look and have some reviews. Thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9977/files - new: https://git.openjdk.org/jdk/pull/9977/files/0beae979..529c6b9c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9977&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9977&range=02-03 Stats: 23 lines in 1 file changed: 15 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/9977.diff Fetch: git fetch https://git.openjdk.org/jdk pull/9977/head:pull/9977 PR: https://git.openjdk.org/jdk/pull/9977 From duke at openjdk.org Sat Sep 17 12:27:50 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Sat, 17 Sep 2022 12:27:50 GMT Subject: RFR: 8292761: x86: Clone nodes to match complex rules [v2] In-Reply-To: References: Message-ID: On Fri, 9 Sep 2022 10:09:17 GMT, Tobias Hartmann wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: >> >> - Merge branch 'master' into cloneSimpleNodes >> - fix >> - Merge branch 'master' into cloneSimpleNodes >> - shorten >> - improve checks >> - lea patterns >> - refactor >> - lea patterns >> - first commit > > Please include the benchmark in the patch. Could you show the generated code before/after? Thanks! Thank @TobiHartmann @chhagedorn for your comments, I have updated the PR to address those. ------------- PR: https://git.openjdk.org/jdk/pull/9977 From duke at openjdk.org Sat Sep 17 16:36:39 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Sat, 17 Sep 2022 16:36:39 GMT Subject: RFR: 8281453: New optimization: convert `~x` into `-1-x` when `~x` is used in an arithmetic expression [v9] In-Reply-To: References: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> Message-ID: <-5jKrvfxxzJQ0oOmucxlANXuhRbwLqP_Ng8qArlgM-Q=.24b9d47b-59e7-42a5-b0d6-ea5669d56a0e@github.com> On Thu, 15 Sep 2022 16:53:44 GMT, Zhiqiang Zang wrote: >> Similar to `(~x)+c` -> `(c-1)-x` and `~(x+c)` -> `(-c-1)-x` in #6858, we can also introduce similar optimizations for subtraction, `c-(~x)` -> `x+(c+1)` and `~(c-x)` -> `x+(-c-1)`. >> >> To generalize, I convert `~x` into `-1-x` when `~x` is used in an arithmetic expression. For example, `c-(~x)` will be converted into `c-(-1-x)` which will match other pattern and will be transformed again in next iteration and finally become `x+(c+1)`. >> >> The results of the microbenchmark are as follows: >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> NotOpTransformation.baselineInt avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.baselineLong avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.testInt1 avgt 60 0.603 ? 0.001 ns/op >> NotOpTransformation.testInt2 avgt 60 0.603 ? 0.001 ns/op >> NotOpTransformation.testLong1 avgt 60 0.658 ? 0.001 ns/op >> NotOpTransformation.testLong2 avgt 60 0.658 ? 0.001 ns/op >> >> Patch: >> Benchmark Mode Cnt Score Error Units >> NotOpTransformation.baselineInt avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.baselineLong avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.testInt1 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testInt2 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testLong1 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testLong2 avgt 60 0.329 ? 0.001 ns/op > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > include microbenchmark. Hi, what if `~x` is also used in another context, would it duplicate this local, also how about the cases where `x` is an addition or subtraction? I think this should be an idealisation of the `XorNode`, and if you cannot really check the uses of a node during parsing, you can record them for igvn later. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/7376 From omikhaltcova at openjdk.org Sun Sep 18 08:56:32 2022 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Sun, 18 Sep 2022 08:56:32 GMT Subject: RFR: 8262901: [macos_aarch64] NativeCallTest expected:<-3.8194101E18> but was:<3.02668882E10> [v2] In-Reply-To: References: Message-ID: <4P78ewnd5NOZH1f4Js0VSuckk--KZ5CrajKL6UU_0wM=.f42cc16c-1860-4d3c-90c2-bf2adf194ce5@github.com> > This PR is opened as a follow-up for [1] and included the "must-done" fixes pointed by @teshull. > > This patch for JVMCI includes the following fixes related to the macOS AArch64 calling convention: > 1. arguments may consume slots on the stack that are not multiples of 8 bytes [2] > 2. natural alignment of stack arguments [2] > 3. stack must remain 16-byte aligned [3][4] > > Tested with tier1 on macOS AArch64 and Linux AArch64. > > [1] https://github.com/openjdk/jdk/pull/6641 > [2] https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms > [3] https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-160#stack > [4] https://docs.microsoft.com/en-us/cpp/build/stack-usage?view=msvc-170 Olga Mikhaltsova has updated the pull request incrementally with one additional commit since the last revision: Refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10238/files - new: https://git.openjdk.org/jdk/pull/10238/files/181c2f17..6f8b3215 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10238&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10238&range=00-01 Stats: 41 lines in 1 file changed: 22 ins; 16 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10238.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10238/head:pull/10238 PR: https://git.openjdk.org/jdk/pull/10238 From duke at openjdk.org Sun Sep 18 17:21:32 2022 From: duke at openjdk.org (Zhiqiang Zang) Date: Sun, 18 Sep 2022 17:21:32 GMT Subject: RFR: 8281453: New optimization: convert `~x` into `-1-x` when `~x` is used in an arithmetic expression [v9] In-Reply-To: <-5jKrvfxxzJQ0oOmucxlANXuhRbwLqP_Ng8qArlgM-Q=.24b9d47b-59e7-42a5-b0d6-ea5669d56a0e@github.com> References: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> <-5jKrvfxxzJQ0oOmucxlANXuhRbwLqP_Ng8qArlgM-Q=.24b9d47b-59e7-42a5-b0d6-ea5669d56a0e@github.com> Message-ID: On Sat, 17 Sep 2022 16:32:45 GMT, Quan Anh Mai wrote: >> Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: >> >> include microbenchmark. > > Hi, what if `~x` is also used in another context, would it duplicate this local, also how about the cases where `x` is an addition or subtraction? I think this should be an idealisation of the `XorNode`, and if you cannot really check the uses of a node during parsing, you can record them for igvn later. Thanks. Thanks @merykitty for the comment. Can you give an example for this? >what if `~x` is also used in another context, would it duplicate this local Also, I think this works. > how about the cases where `x` is an addition or subtraction ------------- PR: https://git.openjdk.org/jdk/pull/7376 From duke at openjdk.org Sun Sep 18 20:06:22 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Sun, 18 Sep 2022 20:06:22 GMT Subject: RFR: 8293976: Use unsigned integers in Assembler/CodeBuffer::emit_int* Message-ID: Assembler/CodeBuffer::emit_int* accept signed int arguments. - Since we are trying to emit some bit patterns into the code buffer instead of doing integer arithmetics, it makes more sense to use unsigned parameters. - It makes usage with constants become inconvenient, as an integer literal is positive, a 0xC0 does not fit into an int8_t, resulting in the compiler complaining about lossy implicit conversions, the current solution is manual casting of the constants to unsigned char, which can be converted to int8_t without complaints. Please have a look and leave some reviews. Thanks very much. ------------- Commit messages: - change signature Changes: https://git.openjdk.org/jdk/pull/10325/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10325&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293976 Stats: 29 lines in 2 files changed: 0 ins; 0 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/10325.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10325/head:pull/10325 PR: https://git.openjdk.org/jdk/pull/10325 From xgong at openjdk.org Mon Sep 19 03:04:57 2022 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 19 Sep 2022 03:04:57 GMT Subject: RFR: 8292898: [vectorapi] Unify vector mask cast operation [v4] In-Reply-To: References: Message-ID: > The current implementation of the vector mask cast operation is > complex that the compiler generates different patterns for different > scenarios. For architectures that do not support the predicate > feature, vector mask is represented the same as the normal vector. > So the vector mask cast is implemented by `VectorCast `node. But this > is not always needed. When two masks have the same element size (e.g. > int vs. float), their bits layout are the same. So casting between > them does not need to emit any instructions. > > Currently the compiler generates different patterns based on the > vector type of the input/output and the platforms. Normally the > "`VectorMaskCast`" op is only used for cases that doesn't emit any > instructions, and "`VectorCast`" op is used to implement the necessary > expand/narrow operations. This can avoid adding some duplicate rules > in the backend. However, this also has the drawbacks: > > 1) The codes are complex, especially when the compiler needs to > check whether the hardware supports the necessary IRs for the > vector mask cast. It needs to check different patterns for > different cases. > 2) The vector mask cast operation could be implemented with cheaper > instructions than the vector casting on some architectures. > > Instead of generating `VectorCast `or `VectorMaskCast `nodes for different > cases of vector mask cast operations, this patch unifies the vector > mask cast implementation with "`VectorMaskCast`" node for all vector types > and platforms. The missing backend rules are also added for it. > > This patch also simplies the vector mask conversion happened in > "`VectorUnbox::Ideal()`". Normally "`VectorUnbox (VectorBox vmask)`" can > be optimized to "`vmask`" if the unboxing type matches with the boxed > "`vmask`" type. Otherwise, it needs the type conversion. Currently the > "`VectorUnbox`" will be transformed to two different patterns to implement > the conversion: > > 1) If the element size is not changed, it is transformed to: > > "VectorMaskCast vmask" > > 2) Otherwise, it is transformed to: > > "VectorLoadMask (VectorStoreMask vmask)" > > It firstly converts the "`vmask`" to a boolean vector with "`VectorStoreMask`", > and then uses "`VectorLoadMask`" to convert the boolean vector to the > dst mask vector. Since this patch makes "`VectorMaskCast`" op supported > for all types on all platforms, it doesn't need the "`VectorLoadMask`" and > "`VectorStoreMask`" to do the conversion. The existing transformation: > > VectorUnbox (VectorBox vmask) => VectorLoadMask (VectorStoreMask vmask) > > can be simplified to: > > VectorUnbox (VectorBox vmask) => VectorMaskCast vmask Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Add assertion to the elem num for mast cast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10192/files - new: https://git.openjdk.org/jdk/pull/10192/files/15bfa98e..e9285233 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10192&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10192&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10192.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10192/head:pull/10192 PR: https://git.openjdk.org/jdk/pull/10192 From duke at openjdk.org Mon Sep 19 03:25:40 2022 From: duke at openjdk.org (Quan Anh Mai) Date: Mon, 19 Sep 2022 03:25:40 GMT Subject: RFR: 8281453: New optimization: convert `~x` into `-1-x` when `~x` is used in an arithmetic expression [v9] In-Reply-To: References: <4mTZu0_hVWb-ztMxMabFilyXAnAqOStCvU9wPmfqCKM=.fa8b7797-6e20-4c9e-80f1-b55ba3d5fe39@github.com> Message-ID: <0mZU-PW2DcYPutXuiyKN-fvtsSFk4QNSZuFBjec3ky4=.f5e045d2-c730-48c5-b069-b89450c2e672@github.com> On Thu, 15 Sep 2022 16:53:44 GMT, Zhiqiang Zang wrote: >> Similar to `(~x)+c` -> `(c-1)-x` and `~(x+c)` -> `(-c-1)-x` in #6858, we can also introduce similar optimizations for subtraction, `c-(~x)` -> `x+(c+1)` and `~(c-x)` -> `x+(-c-1)`. >> >> To generalize, I convert `~x` into `-1-x` when `~x` is used in an arithmetic expression. For example, `c-(~x)` will be converted into `c-(-1-x)` which will match other pattern and will be transformed again in next iteration and finally become `x+(c+1)`. >> >> The results of the microbenchmark are as follows: >> >> Baseline: >> Benchmark Mode Cnt Score Error Units >> NotOpTransformation.baselineInt avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.baselineLong avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.testInt1 avgt 60 0.603 ? 0.001 ns/op >> NotOpTransformation.testInt2 avgt 60 0.603 ? 0.001 ns/op >> NotOpTransformation.testLong1 avgt 60 0.658 ? 0.001 ns/op >> NotOpTransformation.testLong2 avgt 60 0.658 ? 0.001 ns/op >> >> Patch: >> Benchmark Mode Cnt Score Error Units >> NotOpTransformation.baselineInt avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.baselineLong avgt 60 0.439 ? 0.001 ns/op >> NotOpTransformation.testInt1 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testInt2 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testLong1 avgt 60 0.329 ? 0.001 ns/op >> NotOpTransformation.testLong2 avgt 60 0.329 ? 0.001 ns/op > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > include microbenchmark. Consider there are 2 statements: int x = ~a + b; int y = ~a | c; Your transformation would duplicate the `~a` as it is transformed into `(-a - 1)` in the calculation of `x` but not in the calculation of `y`. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/7376 From duke at openjdk.org Mon Sep 19 03:36:25 2022 From: duke at openjdk.org (=?UTF-8?B?546L6LaF?=) Date: Mon, 19 Sep 2022 03:36:25 GMT Subject: RFR: JDK-8293978: Duplicate simple loop back-edge will crash the vm Message-ID: Duplicate back-edge of the following simple loop will make jvm crash. image ------------- Commit messages: - Fix JDK-8293978 Changes: https://git.openjdk.org/jdk/pull/10329/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10329&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293978 Stats: 76 lines in 2 files changed: 75 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10329.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10329/head:pull/10329 PR: https://git.openjdk.org/jdk/pull/10329 From shade at openjdk.org Mon Sep 19 06:20:54 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Sep 2022 06:20:54 GMT Subject: RFR: 8293937: x86: Drop LP64 conditions from clearly x86_32 code In-Reply-To: References: <8UHAbltEwlCwzkbxK6TIlK41HT9rlKqsICS-rMbVOKY=.db04836b-8559-4203-ae29-488ec4d2234b@github.com> Message-ID: On Fri, 16 Sep 2022 16:51:23 GMT, Thomas Stuefe wrote: > So, this is build magic, it just avoids x86_64 files for 32bit builds and vice versa? Yes: https://github.com/openjdk/jdk/blob/b1ed40a87ab357d1b51ac5102bba181f21ffa9b6/make/hotspot/lib/CompileJvm.gmk#L128-L132 ------------- PR: https://git.openjdk.org/jdk/pull/10305 From shade at openjdk.org Mon Sep 19 06:20:55 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Sep 2022 06:20:55 GMT Subject: Integrated: 8293937: x86: Drop LP64 conditions from clearly x86_32 code In-Reply-To: <8UHAbltEwlCwzkbxK6TIlK41HT9rlKqsICS-rMbVOKY=.db04836b-8559-4203-ae29-488ec4d2234b@github.com> References: <8UHAbltEwlCwzkbxK6TIlK41HT9rlKqsICS-rMbVOKY=.db04836b-8559-4203-ae29-488ec4d2234b@github.com> Message-ID: On Fri, 16 Sep 2022 11:10:29 GMT, Aleksey Shipilev wrote: > Noticed this when porting Loom on x86_32. There are `*_x86_32.cpp` files that use `_LP64` as if it matters for them. It does not make sense, as in those files we always have `!_LP64`. We can drop the conditionals and clean the code. > > Proof of completeness: > > > $ ack LP64 src/hotspot/ | grep _32 > src/hotspot/cpu/x86/register_x86.hpp:386: NOT_LP64( 8 + ) // FILL0-FILL7 in x86_32.ad > src/hotspot/cpu/x86/vm_version_x86.hpp:733: return LP64_ONLY(true) NOT_LP64(false); // not implemented on x86_32 This pull request has now been integrated. Changeset: 357a2cc2 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/357a2cc22a72876fc412b4fc99b9da8f05840678 Stats: 46 lines in 2 files changed: 0 ins; 37 del; 9 mod 8293937: x86: Drop LP64 conditions from clearly x86_32 code Reviewed-by: kvn, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/10305 From shade at openjdk.org Mon Sep 19 06:22:12 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Sep 2022 06:22:12 GMT Subject: RFR: 8293844: C2: Verify Location::{oop,normal} types in PhaseOutput::FillLocArray In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 08:50:58 GMT, Aleksey Shipilev wrote: > I have been debugging a weird issue in C2/deopt, and wanted to have stronger asserts in critical paths. One such place is `PhaseOutput::FillLocArray`, which emits `Location::normal` on unconditional `else` branch. `Location::normal` is described as "Ints, floats, double halves". I think we would be better off verifying the types explicitly. Same goes for `Location::oop`, which we can also verify. > > Aside: In fact, I suspect the whole `Regalloc::is_oop` business can go away, and we can rely on reg types to sense if we are dealing with oops here, but that looks like a change with some unexpected effects, so I would like to do that separately, see [JDK-8293845](https://bugs.openjdk.org/browse/JDK-8293845). > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_64 fastdebug `tier2` > - [x] Linux x86_32 fastdebug `tier1` > - [x] Linux x86_32 fastdebug `tier2` Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/10281 From shade at openjdk.org Mon Sep 19 06:22:13 2022 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Sep 2022 06:22:13 GMT Subject: Integrated: 8293844: C2: Verify Location::{oop,normal} types in PhaseOutput::FillLocArray In-Reply-To: References: Message-ID: <8UlxwkqpKVSb8Uneg7FP01TE8q0gb-wIeTp0KX2psVg=.f3d38d74-efc5-4ed9-9dc2-cb30a52afeba@github.com> On Thu, 15 Sep 2022 08:50:58 GMT, Aleksey Shipilev wrote: > I have been debugging a weird issue in C2/deopt, and wanted to have stronger asserts in critical paths. One such place is `PhaseOutput::FillLocArray`, which emits `Location::normal` on unconditional `else` branch. `Location::normal` is described as "Ints, floats, double halves". I think we would be better off verifying the types explicitly. Same goes for `Location::oop`, which we can also verify. > > Aside: In fact, I suspect the whole `Regalloc::is_oop` business can go away, and we can rely on reg types to sense if we are dealing with oops here, but that looks like a change with some unexpected effects, so I would like to do that separately, see [JDK-8293845](https://bugs.openjdk.org/browse/JDK-8293845). > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux x86_64 fastdebug `tier2` > - [x] Linux x86_32 fastdebug `tier1` > - [x] Linux x86_32 fastdebug `tier2` This pull request has now been integrated. Changeset: 26e08cf3 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/26e08cf3d0cbd30395f3344669fcc20c0b52e2f6 Stats: 9 lines in 1 file changed: 8 ins; 0 del; 1 mod 8293844: C2: Verify Location::{oop,normal} types in PhaseOutput::FillLocArray Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/10281 From epeter at openjdk.org Mon Sep 19 06:42:20 2022 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Sep 2022 06:42:20 GMT Subject: RFR: 8293798: Fix test bugs due to incompatibility with -XX:+AlwaysIncrementalInline Message-ID: I investigated and fixed the following three files, which failed (test asserts) if they were run with `-XX:+AlwaysIncrementalInline`. Manually tested the 3 test-files. I also ran some other sanity tests. Looking forward to your feedback, Emanuel **compiler/uncommontrap/Decompile.java** Assert triggered: `java.lang.RuntimeException: Wrong compilation status.: expected false to equal true` We expected the test to deoptimize after introducing the third class Y to a method that is expected to be call with bimorphic inlining. Analysis: It seems that we don't to bimorphic inlining of function calls (2 static inlinings, uncommon_trap for others). https://github.com/openjdk/jdk/blob/aa7ccdf44549a52cce9e99f6569097d3343d9ee4/src/hotspot/share/opto/doCall.cpp#L269-L275 Instead, we create two `LateInlineCallGenerator`, one for a class (hit), and one for the other cases (miss). https://github.com/openjdk/jdk/blob/aa7ccdf44549a52cce9e99f6569097d3343d9ee4/src/hotspot/share/opto/doCall.cpp#L200-L201 `LateInlineCallGenerator` has `is_inline` false, so we do not go into the bimorphic inlining case (would require both receivers to have is_inline true). https://github.com/openjdk/jdk/blob/aa7ccdf44549a52cce9e99f6569097d3343d9ee4/src/hotspot/share/opto/doCall.cpp#L255-L259 Instead, we generate a static call for one class (hit), and a virtual call for the rest (miss). https://github.com/openjdk/jdk/blob/aa7ccdf44549a52cce9e99f6569097d3343d9ee4/src/hotspot/share/opto/doCall.cpp#L279-L280 Therefore, when the third class comes along the function does not trap, and not deopt either. Solution: For now, we will just deactivate the flag in the test. But more discussion is required if we want this flag to change the inline behavior in this way. **compiler/intrinsics/klass/CastNullCheckDroppingsTest.java** Assert triggered: `java.lang.AssertionError: compilation must not get deoptimized` We expect the function `testMHCast(String s)` to be compiled without a `null_check` for input `s`. We test that by calling it with `Null`. We trigger the assert because of a `null_check`, leading to deoptimization. Analysis: We are in `GraphKit::gen_checkcast`. We have a `cast` that is wrapped in a `MethodHandle`, which we want to call via `invokeExact`. Normally, this would get directly inlined, and the output is deduced to be `String*`. With the flag, however, we do not inline `invokeExact`, but create a `JavaStaticCall`, which represents `invokeExact`. This has 4 `Object*` inputs and 1 `Object*` output. So now the result of the cast is a `Object*`. We have a `String*` in the normal case, and `Object*` in the flag case. We now do a local cast to `String*`. For that we first check subtyping. https://github.com/openjdk/jdk/blob/aa7ccdf44549a52cce9e99f6569097d3343d9ee4/src/hotspot/share/opto/graphKit.cpp#L3311 In the normal case, we have a perfect match - so we rely on profiling and do speculative `String::NonNull*`. But in the flag case, we see `Object*` is not subtype of `String*`. So we continue on, and find that profiling has determined `never_see_null = true`, so we set an `uncommon_trap` for the `null_check`. Somehow, the speculative profiling case does not lead to a trap - not sure why yet. Of course in the test we eventually do feed in `Null` - in the regular case there is no trap, in the flag case we trap and deopt, breaking the test assumption. Solution: At any rate, this really looks like a test bug - the flag can change the decisions that have an impact on the asserts of the test. I will disable the flag. **compiler/ciReplay/TestInliningProtectionDomain.java** Assert triggered: `java.lang.RuntimeException: assertTrue: expected true, was false` https://github.com/openjdk/jdk/blob/aa7ccdf44549a52cce9e99f6569097d3343d9ee4/test/hotspot/jtreg/compiler/ciReplay/TestInliningProtectionDomain.java#L73 Analysis: Turns out `isForcedByReplay()` returns false because the "reason" is expected to be `force inline by ciReplay`, but we get a reason `force (incremental) inline by ciReplay`. This seems expected. We should use `isForcedIncrementalInlineByReplay()` if we expect incremental inline to happen. Solution: `isForcedByReplay()` or `isForcedIncrementalInlineByReplay()` ------------- Commit messages: - 8293798: Fix test bugs due to incompatibility with -XX:+AlwaysIncrementalInline Changes: https://git.openjdk.org/jdk/pull/10310/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10310&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293798 Stats: 3 lines in 3 files changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10310.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10310/head:pull/10310 PR: https://git.openjdk.org/jdk/pull/10310 From tholenstein at openjdk.org Mon Sep 19 07:25:56 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 19 Sep 2022 07:25:56 GMT Subject: RFR: JDK-8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs [v6] In-Reply-To: References: Message-ID: <6wX70WO5rl9tNzDVN-qWxLDkXjgSLRnQIbI6NaGZR4Q=.6e374d45-4f8c-49fe-a647-fa664cc96944@github.com> > Refactor the Actions in EditorTopComponent (com.sun.hotspot.igv.view.actions). Move Action specific code from EditorTopComponent to the corresponding Action. > > # Refactoring of com.sun.hotspot.igv.view.actions and EditorTopComponent > - Created a new `ExportGraph` Action and moved corresponding functions `exportToSVG(..)` and `exportToPDF(..)` to new `ExportGraph.java` > - Moved key bindings for satellite-view (pressing S) from `EditorTopComponent` to `OverviewAction.java` > - Moved Action specific code from `EditorTopComponent` to the corresponding `XXXAction.java` > - Changed `PrevDiagramAction`, `ExpandDiffAction`, `ExtractAction`, `HideAction`, `NextDiagramAction`, `ReduceDiffAction` and `ShowAllAction` to be context aware `ContextAction` actions and use more modern `@ActionRegistration` to move away from manually defining actions in `layer.xml` > - new `addContextListener` / `removeContextListener` function in `ContextAction` enables context aware actions to define to which `ChangedEvent` they want to react to > > # Fixing minor Bugs > - "Show empty blocks in control-flow graph view" is selected by default but only enabled in CFG view. > This is distracting for the eye when we are not in CFG: > cfg_before > Now "Show empty blocks in control-flow graph view" is not selected anymore when disabled (greyed out) > cfg_node_disable > But still gets selected by default when enabled > cfg_now > > - "Extract current set of selected nodes", "Hide selected nodes" and "show all nodes" were always enabled, even when they didn't effect anything. > selection_before > Now "Extract current set of selected nodes", "Hide selected nodes" are disabled (greyed out) when no nodes are selected. And "show all nodes" is disabled (greyed out) when all nodes are already visible. > selection_now > > - "Reduce the difference selection" got stuck when at the last graphs in the group because it got greyed out. > reduce_stuck > duce the difference selection" > Now "Reduce the difference selection" works as expected: > reduce_now Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - Undo import reordering - re-add author ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10170/files - new: https://git.openjdk.org/jdk/pull/10170/files/23effacd..720a05d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=04-05 Stats: 45 lines in 11 files changed: 34 ins; 11 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10170.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10170/head:pull/10170 PR: https://git.openjdk.org/jdk/pull/10170 From tholenstein at openjdk.org Mon Sep 19 07:29:54 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 19 Sep 2022 07:29:54 GMT Subject: RFR: JDK-8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs [v7] In-Reply-To: References: Message-ID: > Refactor the Actions in EditorTopComponent (com.sun.hotspot.igv.view.actions). Move Action specific code from EditorTopComponent to the corresponding Action. > > # Refactoring of com.sun.hotspot.igv.view.actions and EditorTopComponent > - Created a new `ExportGraph` Action and moved corresponding functions `exportToSVG(..)` and `exportToPDF(..)` to new `ExportGraph.java` > - Moved key bindings for satellite-view (pressing S) from `EditorTopComponent` to `OverviewAction.java` > - Moved Action specific code from `EditorTopComponent` to the corresponding `XXXAction.java` > - Changed `PrevDiagramAction`, `ExpandDiffAction`, `ExtractAction`, `HideAction`, `NextDiagramAction`, `ReduceDiffAction` and `ShowAllAction` to be context aware `ContextAction` actions and use more modern `@ActionRegistration` to move away from manually defining actions in `layer.xml` > - new `addContextListener` / `removeContextListener` function in `ContextAction` enables context aware actions to define to which `ChangedEvent` they want to react to > > # Fixing minor Bugs > - "Show empty blocks in control-flow graph view" is selected by default but only enabled in CFG view. > This is distracting for the eye when we are not in CFG: > cfg_before > Now "Show empty blocks in control-flow graph view" is not selected anymore when disabled (greyed out) > cfg_node_disable > But still gets selected by default when enabled > cfg_now > > - "Extract current set of selected nodes", "Hide selected nodes" and "show all nodes" were always enabled, even when they didn't effect anything. > selection_before > Now "Extract current set of selected nodes", "Hide selected nodes" are disabled (greyed out) when no nodes are selected. And "show all nodes" is disabled (greyed out) when all nodes are already visible. > selection_now > > - "Reduce the difference selection" got stuck when at the last graphs in the group because it got greyed out. > reduce_stuck > duce the difference selection" > Now "Reduce the difference selection" works as expected: > reduce_now Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: author update 2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10170/files - new: https://git.openjdk.org/jdk/pull/10170/files/720a05d5..99a21ebc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=05-06 Stats: 6 lines in 2 files changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10170.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10170/head:pull/10170 PR: https://git.openjdk.org/jdk/pull/10170 From tholenstein at openjdk.org Mon Sep 19 07:35:57 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 19 Sep 2022 07:35:57 GMT Subject: RFR: JDK-8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs [v8] In-Reply-To: References: Message-ID: > Refactor the Actions in EditorTopComponent (com.sun.hotspot.igv.view.actions). Move Action specific code from EditorTopComponent to the corresponding Action. > > # Refactoring of com.sun.hotspot.igv.view.actions and EditorTopComponent > - Created a new `ExportGraph` Action and moved corresponding functions `exportToSVG(..)` and `exportToPDF(..)` to new `ExportGraph.java` > - Moved key bindings for satellite-view (pressing S) from `EditorTopComponent` to `OverviewAction.java` > - Moved Action specific code from `EditorTopComponent` to the corresponding `XXXAction.java` > - Changed `PrevDiagramAction`, `ExpandDiffAction`, `ExtractAction`, `HideAction`, `NextDiagramAction`, `ReduceDiffAction` and `ShowAllAction` to be context aware `ContextAction` actions and use more modern `@ActionRegistration` to move away from manually defining actions in `layer.xml` > - new `addContextListener` / `removeContextListener` function in `ContextAction` enables context aware actions to define to which `ChangedEvent` they want to react to > > # Fixing minor Bugs > - "Show empty blocks in control-flow graph view" is selected by default but only enabled in CFG view. > This is distracting for the eye when we are not in CFG: > cfg_before > Now "Show empty blocks in control-flow graph view" is not selected anymore when disabled (greyed out) > cfg_node_disable > But still gets selected by default when enabled > cfg_now > > - "Extract current set of selected nodes", "Hide selected nodes" and "show all nodes" were always enabled, even when they didn't effect anything. > selection_before > Now "Extract current set of selected nodes", "Hide selected nodes" are disabled (greyed out) when no nodes are selected. And "show all nodes" is disabled (greyed out) when all nodes are already visible. > selection_now > > - "Reduce the difference selection" got stuck when at the last graphs in the group because it got greyed out. > reduce_stuck > duce the difference selection" > Now "Reduce the difference selection" works as expected: > reduce_now Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: undo removing variable generated by form editor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10170/files - new: https://git.openjdk.org/jdk/pull/10170/files/99a21ebc..a36897fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10170&range=06-07 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10170.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10170/head:pull/10170 PR: https://git.openjdk.org/jdk/pull/10170 From tholenstein at openjdk.org Mon Sep 19 07:44:48 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 19 Sep 2022 07:44:48 GMT Subject: RFR: JDK-8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs [v5] In-Reply-To: References: Message-ID: On Fri, 16 Sep 2022 12:41:16 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Holenstein has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: >> >> - Update Copyright year >> - Update OutlineTopComponent.java >> >> Revert "Update OutlineTopComponent.java" >> >> This reverts commit 65e0651730983e12c032bb89564c3ef93aa34dbe. >> >> revert whitespace change > > src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/actions/ImportAction.java line 52: > >> 50: import org.openide.util.RequestProcessor; >> 51: import org.openide.util.Utilities; >> 52: import org.openide.util.actions.CallableSystemAction; > > As discussed earlier, it would really be preferable to leave cleanups of this kind to a separate RFE, especially if the file is not changed otherwise. It would make reviewing easier, especially on large changesets like this one. Why not postpone all import reordering to https://github.com/openjdk/jdk/pull/10197? Similarly for e.g. the changes in `DiagramScene.java`. I agree, if the file is not touched, imports should not be modified. Hopefully after https://github.com/openjdk/jdk/pull/10197 this mistake shouldn't happen anymore :) > src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java line 431: > >> 429: setLayout(new java.awt.BorderLayout()); >> 430: >> 431: }// //GEN-END:initComponents > > Did you check whether, after moving this generated code, it can still be updated using NetBeans' Form Editor? I reverted the change. The NetBeans' Form Editor opens it, but just shows an empty plane. I don't really know why there is an `JCheckBox jCheckBox1` if we don't use it. But let's just leave it the way NetBeans generated it. ------------- PR: https://git.openjdk.org/jdk/pull/10170 From tholenstein at openjdk.org Mon Sep 19 07:44:50 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 19 Sep 2022 07:44:50 GMT Subject: RFR: JDK-8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs [v8] In-Reply-To: References: Message-ID: On Fri, 16 Sep 2022 12:54:22 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> undo removing variable generated by form editor > > src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExportAction.java line 42: > >> 40: import org.openide.util.LookupListener; >> 41: import org.openide.util.NbBundle; >> 42: import org.openide.util.NbBundle.Messages; > > Please keep the original `@autor` tag, here and in all other cases where it is removed. Done ------------- PR: https://git.openjdk.org/jdk/pull/10170 From tholenstein at openjdk.org Mon Sep 19 07:49:47 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 19 Sep 2022 07:49:47 GMT Subject: RFR: JDK-8290011: IGV: Remove dead code and cleanup [v2] In-Reply-To: References: Message-ID: > Remove dead code from the IGV code base. There are many unused or redundant functions in the code Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: remove whitespace Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10197/files - new: https://git.openjdk.org/jdk/pull/10197/files/e5201fd4..d503224f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10197&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10197&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10197.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10197/head:pull/10197 PR: https://git.openjdk.org/jdk/pull/10197 From fgao at openjdk.org Mon Sep 19 07:53:41 2022 From: fgao at openjdk.org (Fei Gao) Date: Mon, 19 Sep 2022 07:53:41 GMT Subject: RFR: 8290910: Wrong memory state is picked in SuperWord::co_locate_pack() [v2] In-Reply-To: References: <5VdJz-Y2_RAqlUjtke3COI2hv3f0ClDB9nA1F__dE1c=.e373a3db-0c64-4ee8-85a7-0b6692ce1d4e@github.com> Message-ID: On Tue, 30 Aug 2022 03:32:38 GMT, Vladimir Kozlov wrote: >> Fei Gao has updated the pull request incrementally with one additional commit since the last revision: >> >> Code style change: add one space >> >> Change-Id: I2794060ac0f9dbe006e32f202111ee08f09d96a1 > > Can you show assembler after this fix? > > Would be interest to see results for other interleaving cases: > > a[i-1] += ; // similar to your case > a[i] += ; > > > a[i+1] += ; > a[i] += ; > > > a[i] += ; > a[i-1] += ; > > > Also when `+2` is used instead of `+1`. Or `+`. @vnkozlov can I have a review for the new commit? Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/9898 From tholenstein at openjdk.org Mon Sep 19 07:57:08 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 19 Sep 2022 07:57:08 GMT Subject: RFR: JDK-8290011: IGV: Remove dead code and cleanup [v3] In-Reply-To: References: Message-ID: <2ZVkbiTq_Bw8waxDU5KNEH0rbz4Xqyk2QdfmEgFU7lk=.fc61d4e7-8ac2-4f4a-a8c0-e3f647ae4c11@github.com> > Remove dead code from the IGV code base. There are many unused or redundant functions in the code Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: omit this Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10197/files - new: https://git.openjdk.org/jdk/pull/10197/files/d503224f..10c47a1e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10197&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10197&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10197.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10197/head:pull/10197 PR: https://git.openjdk.org/jdk/pull/10197 From tholenstein at openjdk.org Mon Sep 19 07:57:09 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 19 Sep 2022 07:57:09 GMT Subject: RFR: JDK-8290011: IGV: Remove dead code and cleanup [v3] In-Reply-To: References: Message-ID: On Thu, 8 Sep 2022 08:16:27 GMT, Christian Hagedorn wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> omit this >> >> Co-authored-by: Christian Hagedorn > > src/utils/IdealGraphVisualizer/Data/src/main/java/com/sun/hotspot/igv/data/InputNode.java line 34: > >> 32: public class InputNode extends Properties.Entity { >> 33: >> 34: private int id; > > While cleaning this class up anyways: Feels like a node id should probably not change anymore once it's set. Can this be turned into a `final` field? Looks like `setId()` is only called from this class and once from another class when creating a new input node anyways. I think it is called in `Difference.java` as well: `n2.setId(curIndex);` , right? ------------- PR: https://git.openjdk.org/jdk/pull/10197 From roland at openjdk.org Mon Sep 19 08:00:45 2022 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 19 Sep 2022 08:00:45 GMT Subject: RFR: JDK-8293978: Duplicate simple loop back-edge will crash the vm In-Reply-To: References: Message-ID: On Mon, 19 Sep 2022 03:30:15 GMT, ?? wrote: > Duplicate back-edge of the following simple loop will make jvm crash. > > image Changes requested by roland (Reviewer). src/hotspot/share/opto/loopopts.cpp line 3972: > 3970: } > 3971: > 3972: if (idom(region)->is_Catch() || region == head) { That check could be done earlier I think at the test for: `if (!incr->is_Phi()) {` ------------- PR: https://git.openjdk.org/jdk/pull/10329 From tholenstein at openjdk.org Mon Sep 19 08:01:07 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 19 Sep 2022 08:01:07 GMT Subject: RFR: JDK-8290011: IGV: Remove dead code and cleanup [v4] In-Reply-To: References: Message-ID: > Remove dead code from the IGV code base. There are many unused or redundant functions in the code Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java remove whitespace Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10197/files - new: https://git.openjdk.org/jdk/pull/10197/files/10c47a1e..23ddbf98 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10197&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10197&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10197.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10197/head:pull/10197 PR: https://git.openjdk.org/jdk/pull/10197 From tholenstein at openjdk.org Mon Sep 19 08:01:08 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 19 Sep 2022 08:01:08 GMT Subject: RFR: JDK-8290011: IGV: Remove dead code and cleanup [v4] In-Reply-To: References: Message-ID: <20fafNRXlKy7NcNczdju3gHvRn1hhpzdKF1l64VdS_s=.449550e3-03b1-4266-aebc-1738e4cc773e@github.com> On Thu, 8 Sep 2022 08:12:46 GMT, Christian Hagedorn wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java >> >> remove whitespace >> >> Co-authored-by: Christian Hagedorn > > src/utils/IdealGraphVisualizer/Data/src/main/java/com/sun/hotspot/igv/data/InputNode.java line 36: > >> 34: private int id; >> 35: >> 36: public static final Comparator COMPARATOR = new Comparator() { > > Is unused as well and can be removed. Same for `getPropertyComparator()`. you are right! done > src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramViewModel.java line 92: > >> 90: boolean viewPropertiesChanged = false; >> 91: >> 92: boolean groupChanged = (group != newModel.group); > > Was that a bug? Yes, it didn't cause any crashes or wrong behaviour as far is I can tell. But from looking at the code, it makes sense that it should be `!=` for `groupChanged` to be true ------------- PR: https://git.openjdk.org/jdk/pull/10197 From duke at openjdk.org Mon Sep 19 08:40:01 2022 From: duke at openjdk.org (=?UTF-8?B?546L6LaF?=) Date: Mon, 19 Sep 2022 08:40:01 GMT Subject: RFR: JDK-8293978: Duplicate simple loop back-edge will crash the vm [v2] In-Reply-To: References: Message-ID: > Duplicate back-edge of the following simple loop will make jvm crash. > > image ?? has updated the pull request incrementally with one additional commit since the last revision: Move the condition ahead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10329/files - new: https://git.openjdk.org/jdk/pull/10329/files/beea2ffb..63fd328e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10329&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10329&range=00-01 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10329.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10329/head:pull/10329 PR: https://git.openjdk.org/jdk/pull/10329 From duke at openjdk.org Mon Sep 19 08:40:02 2022 From: duke at openjdk.org (=?UTF-8?B?546L6LaF?=) Date: Mon, 19 Sep 2022 08:40:02 GMT Subject: RFR: JDK-8293978: Duplicate simple loop back-edge will crash the vm [v2] In-Reply-To: References: Message-ID: On Mon, 19 Sep 2022 07:56:39 GMT, Roland Westrelin wrote: >> ?? has updated the pull request incrementally with one additional commit since the last revision: >> >> Move the condition ahead > > src/hotspot/share/opto/loopopts.cpp line 3972: > >> 3970: } >> 3971: >> 3972: if (idom(region)->is_Catch() || region == head) { > > That check could be done earlier I think at the test for: > `if (!incr->is_Phi()) {` Thank your for the suggestion! I have move the check to `if (!incr->is_Phi())` ------------- PR: https://git.openjdk.org/jdk/pull/10329 From duke at openjdk.org Mon Sep 19 08:42:40 2022 From: duke at openjdk.org (=?UTF-8?B?546L6LaF?=) Date: Mon, 19 Sep 2022 08:42:40 GMT Subject: RFR: JDK-8293978: Duplicate simple loop back-edge will crash the vm [v3] In-Reply-To: References: Message-ID: <6yEjhp8fXcKUAN4F6Z-r0OmmH43yJ8qoVXvxZ2hP470=.dbf033a9-c1fe-4b25-8e12-10ad3d7db3c4@github.com> > Duplicate back-edge of the following simple loop will make jvm crash. > > image ?? has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Move the condition ahead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10329/files - new: https://git.openjdk.org/jdk/pull/10329/files/63fd328e..474bbeba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10329&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10329&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10329.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10329/head:pull/10329 PR: https://git.openjdk.org/jdk/pull/10329 From roland at openjdk.org Mon Sep 19 09:04:51 2022 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 19 Sep 2022 09:04:51 GMT Subject: RFR: JDK-8293978: Duplicate simple loop back-edge will crash the vm [v3] In-Reply-To: <6yEjhp8fXcKUAN4F6Z-r0OmmH43yJ8qoVXvxZ2hP470=.dbf033a9-c1fe-4b25-8e12-10ad3d7db3c4@github.com> References: <6yEjhp8fXcKUAN4F6Z-r0OmmH43yJ8qoVXvxZ2hP470=.dbf033a9-c1fe-4b25-8e12-10ad3d7db3c4@github.com> Message-ID: On Mon, 19 Sep 2022 08:42:40 GMT, ?? wrote: >> Duplicate back-edge of the following simple loop will make jvm crash. >> >> image > > ?? has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Move the condition ahead Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR: https://git.openjdk.org/jdk/pull/10329 From chagedorn at openjdk.org Mon Sep 19 10:14:58 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Sep 2022 10:14:58 GMT Subject: RFR: JDK-8293364: IGV: Refactor Action in EditorTopComponent and fix minor bugs [v8] In-Reply-To: References: Message-ID: On Mon, 19 Sep 2022 07:35:57 GMT, Tobias Holenstein wrote: >> Refactor the Actions in EditorTopComponent (com.sun.hotspot.igv.view.actions). Move Action specific code from EditorTopComponent to the corresponding Action. >> >> # Refactoring of com.sun.hotspot.igv.view.actions and EditorTopComponent >> - Created a new `ExportGraph` Action and moved corresponding functions `exportToSVG(..)` and `exportToPDF(..)` to new `ExportGraph.java` >> - Moved key bindings for satellite-view (pressing S) from `EditorTopComponent` to `OverviewAction.java` >> - Moved Action specific code from `EditorTopComponent` to the corresponding `XXXAction.java` >> - Changed `PrevDiagramAction`, `ExpandDiffAction`, `ExtractAction`, `HideAction`, `NextDiagramAction`, `ReduceDiffAction` and `ShowAllAction` to be context aware `ContextAction` actions and use more modern `@ActionRegistration` to move away from manually defining actions in `layer.xml` >> - new `addContextListener` / `removeContextListener` function in `ContextAction` enables context aware actions to define to which `ChangedEvent` they want to react to >> >> # Fixing minor Bugs >> - "Show empty blocks in control-flow graph view" is selected by default but only enabled in CFG view. >> This is distracting for the eye when we are not in CFG: >> cfg_before >> Now "Show empty blocks in control-flow graph view" is not selected anymore when disabled (greyed out) >> cfg_node_disable >> But still gets selected by default when enabled >> cfg_now >> >> - "Extract current set of selected nodes", "Hide selected nodes" and "show all nodes" were always enabled, even when they didn't effect anything. >> selection_before >> Now "Extract current set of selected nodes", "Hide selected nodes" are disabled (greyed out) when no nodes are selected. And "show all nodes" is disabled (greyed out) when all nodes are already visible. >> selection_now >> >> - "Reduce the difference selection" got stuck when at the last graphs in the group because it got greyed out. >> reduce_stuck >> duce the difference selection" >> Now "Reduce the difference selection" works as expected: >> reduce_now > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > undo removing variable generated by form editor I only have some code style specific comments. Otherwise, good cleanup! src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramViewModel.java line 188: > 186: public void setHideDuplicates(boolean b) { > 187: InputGraph currentGraph = getFirstGraph(); > 188: if (b) { I suggest to rename `b` to `hideDuplicates`. src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java line 171: > 169: } > 170: > 171: }; Could be changed into using a lambda: Suggestion: ChangedListener diagramChangedListener = source -> { updateDisplayName(); Collection list = new ArrayList<>(); list.add(new EditorInputGraphProvider(EditorTopComponent.this)); graphContent.set(list, null); diagramProvider.getChangedEvent().fire(); }; src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java line 268: > 266: > 267: public DiagramViewModel getModel() { > 268: return scene.getModel(); Suggestion: return scene.getModel(); src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java line 275: > 273: } > 274: > 275: public void selectionMode(boolean b) { I suggest to rename it to `setSelectionMode` as it otherwise suggests being a getter. src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java line 306: > 304: > 305: public static EditorTopComponent getActive() { > 306: TopComponent topComponent = EditorTopComponent.getRegistry().getActivated(); Can be simplified to: Suggestion: TopComponent topComponent = getRegistry().getActivated(); src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java line 379: > 377: > 378: @Override > 379: public void componentOpened() { } Is this empty override required? Otherwise, the method can be removed. src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/EnableBlockLayoutAction.java line 40: > 38: * @author Thomas Wuerthinger > 39: */ > 40: public class EnableBlockLayoutAction extends AbstractAction implements PropertyChangeListener { `EnableBlockLayoutAction`, `EnableCFGLayoutAction` and `EnableSeaLayoutAction` are very similar and could share some code. You could add a super class `EnableLayoutAction` (or any other name that fits) for them and put everything that's shared into it. src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/EnableBlockLayoutAction.java line 42: > 40: public class EnableBlockLayoutAction extends AbstractAction implements PropertyChangeListener { > 41: > 42: EditorTopComponent editor; Should be made `private` (and also `final`): Suggestion: private final EditorTopComponent editor; src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExpandPredecessorsAction.java line 44: > 42: EditorTopComponent editor = EditorTopComponent.getActive(); > 43: if (editor != null) { > 44: Set
oldSelection = editor.getModel().getSelectedFigures(); This code looks very similar to the code in `ExpandSuccessorAction`. The only difference is `f.getSuccessors()` vs `f.getPredecessors()`. Moreover, the only other difference between this class and `ExpandSuccessorAction` seems to be `getName()`. I suggest to create a super class `ExpandAdjacentAction` (or another name that fits) for these two classes and have a common `expandAdjacent()` method (or another name that fits) for this code that just takes the predecessors or successors as parameter. src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ReduceDiffAction.java line 99: > 97: model.getViewPropertiesChangedEvent().removeListener(this); > 98: model.getHiddenNodesChangedEvent().removeListener(this); > 99: } These three methods seem to be shared among all subclasses. Is it possible to move them to the super class? src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ReduceDiffAction.java line 104: > 102: public Action createContextAwareInstance(Lookup actionContext) { > 103: return this; > 104: } It seems you are not really creating instances anymore by looking at all the new overrides of `createContextAwareInstance()` which just return `this`. Is this method still required or can the usages be replaced by directly using the caller object? ------------- Changes requested by chagedorn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10170 From chagedorn at openjdk.org Mon Sep 19 10:15:55 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Sep 2022 10:15:55 GMT Subject: RFR: 8292088: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue In-Reply-To: <4KiGggNq8m0gnRkAxarMncZ1gVWSJ-QNS3IFXzmIdQE=.5e847c08-973d-4517-8778-c9d6d33c7b50@github.com> References: <4KiGggNq8m0gnRkAxarMncZ1gVWSJ-QNS3IFXzmIdQE=.5e847c08-973d-4517-8778-c9d6d33c7b50@github.com> Message-ID: On Fri, 16 Sep 2022 12:00:43 GMT, Christian Hagedorn wrote: > In `testKnownLimit()`, we directly use the (pre-incremented) iv phi `IV_PHI_i` (`232 Phi`) in the loop exit check of the `while` loop: > > ![Screenshot from 2022-09-16 10-33-58](https://user-images.githubusercontent.com/17833009/190604454-50aa1b1e-7111-4723-a329-b95e0f26c220.png) > > Such pre-incremented iv phi uses after the loop are detected in `PhaseIdealLoop::reorg_offsets()` and replaced in order to reduce register pressure. We insert an additional `Opaque2` node to prevent any optimizations to undo the effect of `PhaseIdealLoop::reorg_offsets()`: > > > // iv Phi iv Phi > // | | > // | AddI (+stride) > // | | > // | Opaque2 # Blocks IGVN from folding these nodes until loop opts are over. > // | ====> | > // | AddI (-stride) > // | | > // | CastII # Preserve type of iv Phi > // | | > // Outside Use Outside Use > > > In the test case, this is done before CCP and looks like this: > > ![Screenshot from 2022-09-16 10-33-35](https://user-images.githubusercontent.com/17833009/190623922-3b0c9eeb-8468-4cd7-8fe1-1f7df3dc5071.png) > > At that point, we do not know yet that the `while` loop is only gonna be executed once (i.e. `422 CountedLoopEnd` is always false). This only becomes known after CCP where the type of `232 Phi` improves. But since we have an `Opaque2` node, this update is not propagated until the `Opaque2` nodes are removed in macro expansion: > > https://github.com/openjdk/jdk/blob/11e7d53b23796cbd3d878048f7553885ae07f4d1/src/hotspot/share/opto/macro.cpp#L2412-L2414 > > During macro expansion, we also adjust the strip mined loop: We move the `421 Bool` of the inner loop exit check `422 CountedLoopEnd` to the outer strip mined loop exit check and adjust the inner loop exit check in such a way that C2 cannot figure out that the entire loop is only run once. In the next IGVN phase, the outer strip mined loop node is removed while the inner loop `429 CountedLoop` is not. > > Later in `verify_strip_mined()`, we cannot find the outer strip mined loop of `429 CountedLoop` anymore and we fail with the assertion. > > The first thought to fix this problem is to add `Opaque2::Value()` to let type information flow. But this does not fix the problem completely if the type of the iv phi has no known upper limit. There we have the problem that in general `type(phi) != type(phi + x - x)` because `phi + x` could overflow and we end up with type `int` (which happens in `testUnknownLimit()`). > > I therefore suggest to remove `Opaque2` nodes earlier before macro expansion to fix this bug. A good place seems to be right after loop opts are over. We can remove them at the same time as `Opaque1` nodes by adding a similar `Identity()` method. This lets the loop nodes to be folded away before trying to adjust the outer strip mined loop limit. > > #### Are Opaque2 nodes really useful? > > When working on this bug, I started to question the usage of `Opaque2` nodes in general. We are still running IGVN after `Opaque2` nodes are currently removed. This simply undoes the effects of `PhaseIdealLoop::reorg_offsets()` again and we end up using pre-incremented iv phis anyways. My theory was that we are either blocking some specific optimizations during loop opts which cannot be reverted later in IGVN or that we initially (when this `Opaque2` optimization was added) did not run IGVN anymore once `Opaque2` nodes are removed. > > I could not think of any such non-revertable optimization that `Opaque2` nodes could prevent. On top of that, `PhaseIdealLoop::reorg_offsets()` also does not mention anything alike. I therefore had a look at the history of `Opaque2` nodes. Unfortunately, they were added before the initial load commit. I've dug deeper through some old closed repo and found that at the time the `Opaque2` nodes were introduced around 20 years ago, we did not do any IGVN anymore after the removal of the `Opaque2` nodes - and we generated code with these unoptimized `iv phi + x - x` patterns. > > This suggests that today the `Opaque2` nodes are indeed not really doing what they were originally supposed to do. I would therefore suggest to investigate their complete removal in a separate RFE and go with the suggested fix above which does not make the current situation of the questionable `Opaque2` node usages any worse. > > Thanks, > Christian Thanks Vladimir for your review! ------------- PR: https://git.openjdk.org/jdk/pull/10306 From chagedorn at openjdk.org Mon Sep 19 10:16:56 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Sep 2022 10:16:56 GMT Subject: RFR: 8293849: PrintIdealPhase in compiler directives file is ignored when used with other compile commands In-Reply-To: References: Message-ID: On Thu, 15 Sep 2022 11:20:06 GMT, Christian Hagedorn wrote: > When using a compiler directives file with `PrintIdealPhase`: > > > [ > { > match : "Test::*", > log : true, > PrintIdealPhase : "BEFORE_MATCHING" > } > ] > > > together with other compile commands specified in `compilerdirectives_common_flags` and/or `compilerdirectives_c2_flags`: > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/compiler/compilerDirectives.hpp#L38-L39 > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/compiler/compilerDirectives.hpp#L63-L64 > > then the `PrintIdealPhase` option is ignored. > > The reason is that when cloning the `DirectiveSet` for the current compilation in `DirectiveSet::clone()`, we only set `PrintIdealPhaseOption` but forget to also set `_ideal_phase_name_mask` which is used when deciding if a compile phase should be dumped or not. As a result, the mask keeps its default value zero and nothing is dumped because `Compile::shoud_print_phase()` returns false: > > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/opto/compile.cpp#L5060-L5067 > > > The fix is to also clone the old value of `_ideal_phase_name_mask`. > > Thanks, > Christian Thanks Vladimir for your review! ------------- PR: https://git.openjdk.org/jdk/pull/10283 From tholenstein at openjdk.org Mon Sep 19 10:17:24 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 19 Sep 2022 10:17:24 GMT Subject: RFR: JDK-8290011: IGV: Remove dead code and cleanup [v5] In-Reply-To: References: Message-ID: > Remove dead code from the IGV code base. There are many unused or redundant functions in the code Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8290011' of github.com:tobiasholenstein/jdk into JDK-8290011 - remove unsued getPropertyComparator() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10197/files - new: https://git.openjdk.org/jdk/pull/10197/files/23ddbf98..9692b5f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10197&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10197&range=03-04 Stats: 32 lines in 1 file changed: 0 ins; 32 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10197.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10197/head:pull/10197 PR: https://git.openjdk.org/jdk/pull/10197 From chagedorn at openjdk.org Mon Sep 19 10:17:52 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Sep 2022 10:17:52 GMT Subject: Integrated: 8292088: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue In-Reply-To: <4KiGggNq8m0gnRkAxarMncZ1gVWSJ-QNS3IFXzmIdQE=.5e847c08-973d-4517-8778-c9d6d33c7b50@github.com> References: <4KiGggNq8m0gnRkAxarMncZ1gVWSJ-QNS3IFXzmIdQE=.5e847c08-973d-4517-8778-c9d6d33c7b50@github.com> Message-ID: On Fri, 16 Sep 2022 12:00:43 GMT, Christian Hagedorn wrote: > In `testKnownLimit()`, we directly use the (pre-incremented) iv phi `IV_PHI_i` (`232 Phi`) in the loop exit check of the `while` loop: > > ![Screenshot from 2022-09-16 10-33-58](https://user-images.githubusercontent.com/17833009/190604454-50aa1b1e-7111-4723-a329-b95e0f26c220.png) > > Such pre-incremented iv phi uses after the loop are detected in `PhaseIdealLoop::reorg_offsets()` and replaced in order to reduce register pressure. We insert an additional `Opaque2` node to prevent any optimizations to undo the effect of `PhaseIdealLoop::reorg_offsets()`: > > > // iv Phi iv Phi > // | | > // | AddI (+stride) > // | | > // | Opaque2 # Blocks IGVN from folding these nodes until loop opts are over. > // | ====> | > // | AddI (-stride) > // | | > // | CastII # Preserve type of iv Phi > // | | > // Outside Use Outside Use > > > In the test case, this is done before CCP and looks like this: > > ![Screenshot from 2022-09-16 10-33-35](https://user-images.githubusercontent.com/17833009/190623922-3b0c9eeb-8468-4cd7-8fe1-1f7df3dc5071.png) > > At that point, we do not know yet that the `while` loop is only gonna be executed once (i.e. `422 CountedLoopEnd` is always false). This only becomes known after CCP where the type of `232 Phi` improves. But since we have an `Opaque2` node, this update is not propagated until the `Opaque2` nodes are removed in macro expansion: > > https://github.com/openjdk/jdk/blob/11e7d53b23796cbd3d878048f7553885ae07f4d1/src/hotspot/share/opto/macro.cpp#L2412-L2414 > > During macro expansion, we also adjust the strip mined loop: We move the `421 Bool` of the inner loop exit check `422 CountedLoopEnd` to the outer strip mined loop exit check and adjust the inner loop exit check in such a way that C2 cannot figure out that the entire loop is only run once. In the next IGVN phase, the outer strip mined loop node is removed while the inner loop `429 CountedLoop` is not. > > Later in `verify_strip_mined()`, we cannot find the outer strip mined loop of `429 CountedLoop` anymore and we fail with the assertion. > > The first thought to fix this problem is to add `Opaque2::Value()` to let type information flow. But this does not fix the problem completely if the type of the iv phi has no known upper limit. There we have the problem that in general `type(phi) != type(phi + x - x)` because `phi + x` could overflow and we end up with type `int` (which happens in `testUnknownLimit()`). > > I therefore suggest to remove `Opaque2` nodes earlier before macro expansion to fix this bug. A good place seems to be right after loop opts are over. We can remove them at the same time as `Opaque1` nodes by adding a similar `Identity()` method. This lets the loop nodes to be folded away before trying to adjust the outer strip mined loop limit. > > #### Are Opaque2 nodes really useful? > > When working on this bug, I started to question the usage of `Opaque2` nodes in general. We are still running IGVN after `Opaque2` nodes are currently removed. This simply undoes the effects of `PhaseIdealLoop::reorg_offsets()` again and we end up using pre-incremented iv phis anyways. My theory was that we are either blocking some specific optimizations during loop opts which cannot be reverted later in IGVN or that we initially (when this `Opaque2` optimization was added) did not run IGVN anymore once `Opaque2` nodes are removed. > > I could not think of any such non-revertable optimization that `Opaque2` nodes could prevent. On top of that, `PhaseIdealLoop::reorg_offsets()` also does not mention anything alike. I therefore had a look at the history of `Opaque2` nodes. Unfortunately, they were added before the initial load commit. I've dug deeper through some old closed repo and found that at the time the `Opaque2` nodes were introduced around 20 years ago, we did not do any IGVN anymore after the removal of the `Opaque2` nodes - and we generated code with these unoptimized `iv phi + x - x` patterns. > > This suggests that today the `Opaque2` nodes are indeed not really doing what they were originally supposed to do. I would therefore suggest to investigate their complete removal in a separate RFE and go with the suggested fix above which does not make the current situation of the questionable `Opaque2` node usages any worse. > > Thanks, > Christian This pull request has now been integrated. Changeset: 471e2f12 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/471e2f12b44cafc583a8ae118e36df5f00dfd624 Stats: 221 lines in 4 files changed: 218 ins; 0 del; 3 mod 8292088: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue Reviewed-by: roland, kvn ------------- PR: https://git.openjdk.org/jdk/pull/10306 From chagedorn at openjdk.org Mon Sep 19 10:18:51 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Sep 2022 10:18:51 GMT Subject: Integrated: 8293849: PrintIdealPhase in compiler directives file is ignored when used with other compile commands In-Reply-To: References: Message-ID: <3IYZJGCE_B8EAxeq27mTB4PEi_BnI3ElQVng9To0hJU=.a8480f18-156d-4c73-9c4b-88d8547df2fd@github.com> On Thu, 15 Sep 2022 11:20:06 GMT, Christian Hagedorn wrote: > When using a compiler directives file with `PrintIdealPhase`: > > > [ > { > match : "Test::*", > log : true, > PrintIdealPhase : "BEFORE_MATCHING" > } > ] > > > together with other compile commands specified in `compilerdirectives_common_flags` and/or `compilerdirectives_c2_flags`: > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/compiler/compilerDirectives.hpp#L38-L39 > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/compiler/compilerDirectives.hpp#L63-L64 > > then the `PrintIdealPhase` option is ignored. > > The reason is that when cloning the `DirectiveSet` for the current compilation in `DirectiveSet::clone()`, we only set `PrintIdealPhaseOption` but forget to also set `_ideal_phase_name_mask` which is used when deciding if a compile phase should be dumped or not. As a result, the mask keeps its default value zero and nothing is dumped because `Compile::shoud_print_phase()` returns false: > > https://github.com/openjdk/jdk/blob/aff5ff14b208b3c2be93d7b4fab8b07c5be12f3e/src/hotspot/share/opto/compile.cpp#L5060-L5067 > > > The fix is to also clone the old value of `_ideal_phase_name_mask`. > > Thanks, > Christian This pull request has now been integrated. Changeset: d41f69f9 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/d41f69f9c0297fe78884b5aa2d149745215ec9d2 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8293849: PrintIdealPhase in compiler directives file is ignored when used with other compile commands Reviewed-by: rcastanedalo, kvn ------------- PR: https://git.openjdk.org/jdk/pull/10283 From tholenstein at openjdk.org Mon Sep 19 10:22:45 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 19 Sep 2022 10:22:45 GMT Subject: RFR: JDK-8290011: IGV: Remove dead code and cleanup [v6] In-Reply-To: References: Message-ID: > Remove dead code from the IGV code base. There are many unused or redundant functions in the code Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: access static front in Diagram without getter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10197/files - new: https://git.openjdk.org/jdk/pull/10197/files/9692b5f5..35f681c8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10197&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10197&range=04-05 Stats: 33 lines in 5 files changed: 2 ins; 14 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/10197.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10197/head:pull/10197 PR: https://git.openjdk.org/jdk/pull/10197 From dnsimon at openjdk.org Mon Sep 19 12:15:43 2022 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 19 Sep 2022 12:15:43 GMT Subject: RFR: 8293989: [JVMCI] re-use cleared oop handles Message-ID: It's possible for a libgraal isolate to live long enough that `JVMCIRuntime::__oop_handles` grows so much that it overflows when trying to expand. This results in a VM crash something like: # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 18446744056529682432 bytes for AllocateHeap ... V [libjvm.so+0xe1f441] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x1a1 V [libjvm.so+0xe2006d] VMError::report_and_die(Thread*, char const*, int, unsigned long, VMErrorType, char const*, __va_list_tag*)+0x2d V [libjvm.so+0x5d8ed3] report_vm_out_of_memory(char const*, int, unsigned long, VMErrorType, char const*, ...)+0xc3 V [libjvm.so+0x39f7c2] AllocateHeap(unsigned long, MEMFLAGS, AllocFailStrategy::AllocFailEnum)+0x92 V [libjvm.so+0x8303f2] GrowableArrayWithAllocator<_jobject*, GrowableArray<_jobject*> >::grow(int)+0x112 V [libjvm.so+0x995385] JVMCIRuntime::make_global(Handle const&)+0x105 The solution implemented in this PR is to clear and re-use entries in `JVMCIRuntime::__oop_handles`. ------------- Commit messages: - re-use slots in JVMCIRuntime::_oop_handles Changes: https://git.openjdk.org/jdk/pull/10337/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10337&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293989 Stats: 191 lines in 9 files changed: 89 ins; 62 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/10337.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10337/head:pull/10337 PR: https://git.openjdk.org/jdk/pull/10337 From chagedorn at openjdk.org Mon Sep 19 12:38:22 2022 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Sep 2022 12:38:22 GMT Subject: RFR: JDK-8293978: Duplicate simple loop back-edge will crash the vm [v3] In-Reply-To: <6yEjhp8fXcKUAN4F6Z-r0OmmH43yJ8qoVXvxZ2hP470=.dbf033a9-c1fe-4b25-8e12-10ad3d7db3c4@github.com> References: <6yEjhp8fXcKUAN4F6Z-r0OmmH43yJ8qoVXvxZ2hP470=.dbf033a9-c1fe-4b25-8e12-10ad3d7db3c4@github.com> Message-ID: On Mon, 19 Sep 2022 08:42:40 GMT, ?? wrote: >> Duplicate back-edge of the following simple loop will make jvm crash. >> >> image > > ?? has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Move the condition ahead Otherwise, the fix itself looks good. test/hotspot/jtreg/compiler/c2/TestDuplicateSimpleLoopBackedge.java line 37: > 35: > 36: class Foo { > 37: static void c(Byte[] a, Byte d) { Method can be moved to `TestDuplicateSimpleLoopBackedge`. test/hotspot/jtreg/compiler/c2/TestDuplicateSimpleLoopBackedge.java line 43: > 41: } > 42: > 43: public class TestDuplicateSimpleLoopBackedge { There are a lot of unused variables which could probably be removed and it would still trigger the crash. Also, I recommend to give methods a name different from single letters even though it's just a test. test/hotspot/jtreg/compiler/c2/TestDuplicateSimpleLoopBackedge.java line 74: > 72: thread.start(); > 73: Thread.sleep(Utils.adjustTimeout(4000)); > 74: } Can be simplified to just: TestDuplicateSimpleLoopBackedge n = new TestDuplicateSimpleLoopBackedge(); for (int i = 0; i < 10000; ++i) { n.j(args); } and then you can run the test with `-Xbatch` which disabled background compilation and waits until the compilation is complete. ------------- Changes requested by chagedorn (Reviewer). PR: https://git.openjdk.org/jdk/pull/10329 From tholenstein at openjdk.org Mon Sep 19 12:51:52 2022 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 19 Sep 2022 12:51:52 GMT Subject: RFR: JDK-8290011: IGV: Remove dead code and cleanup [v7] In-Reply-To: References: Message-ID: <9HxDlAQK11PiVI8bY7z83vUJ0TtcXKVZXKaLMz_kVGg=.8f9c3dcd-9c87-4788-98a1-4252304fd703@github.com> > Remove dead code from the IGV code base. There are many unused or redundant func